Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 195
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38366802

RESUMO

Anti-coronavirus peptides (ACVPs) represent a relatively novel approach of inhibiting the adsorption and fusion of the virus with human cells. Several peptide-based inhibitors showed promise as potential therapeutic drug candidates. However, identifying such peptides in laboratory experiments is both costly and time consuming. Therefore, there is growing interest in using computational methods to predict ACVPs. Here, we describe a model for the prediction of ACVPs that is based on the combination of feature engineering (FE) optimization and deep representation learning. FEOpti-ACVP was pre-trained using two feature extraction frameworks. At the next step, several machine learning approaches were tested in to construct the final algorithm. The final version of FEOpti-ACVP outperformed existing methods used for ACVPs prediction and it has the potential to become a valuable tool in ACVP drug design. A user-friendly webserver of FEOpti-ACVP can be accessed at http://servers.aibiochem.net/soft/FEOpti-ACVP/.


Assuntos
Algoritmos , Peptídeos , Humanos , Sequência de Aminoácidos , Peptídeos/farmacologia , Aprendizado de Máquina
2.
BMC Bioinformatics ; 25(1): 56, 2024 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-38308205

RESUMO

BACKGROUND: Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES). RESULTS: First, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction: ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naïve Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen's Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems. CONCLUSIONS: Our results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla , Humanos , Teorema de Bayes , Aprendizado de Máquina , República da Coreia/epidemiologia
3.
Respir Res ; 25(1): 199, 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38720331

RESUMO

BACKGROUND: Bronchopulmonary dysplasia-associated pulmonary hypertension (BPD-PH) remains a devastating clinical complication seriously affecting the therapeutic outcome of preterm infants. Hence, early prevention and timely diagnosis prior to pathological change is the key to reducing morbidity and improving prognosis. Our primary objective is to utilize machine learning techniques to build predictive models that could accurately identify BPD infants at risk of developing PH. METHODS: The data utilized in this study were collected from neonatology departments of four tertiary-level hospitals in China. To address the issue of imbalanced data, oversampling algorithms synthetic minority over-sampling technique (SMOTE) was applied to improve the model. RESULTS: Seven hundred sixty one clinical records were collected in our study. Following data pre-processing and feature selection, 5 of the 46 features were used to build models, including duration of invasive respiratory support (day), the severity of BPD, ventilator-associated pneumonia, pulmonary hemorrhage, and early-onset PH. Four machine learning models were applied to predictive learning, and after comprehensive selection a model was ultimately selected. The model achieved 93.8% sensitivity, 85.0% accuracy, and 0.933 AUC. A score of the logistic regression formula greater than 0 was identified as a warning sign of BPD-PH. CONCLUSIONS: We comprehensively compared different machine learning models and ultimately obtained a good prognosis model which was sufficient to support pediatric clinicians to make early diagnosis and formulate a better treatment plan for pediatric patients with BPD-PH.


Assuntos
Displasia Broncopulmonar , Hipertensão Pulmonar , Aprendizado de Máquina , Humanos , Displasia Broncopulmonar/diagnóstico , Recém-Nascido , Hipertensão Pulmonar/diagnóstico , Masculino , Feminino , Estudos Retrospectivos , Lactente Extremamente Prematuro , Recém-Nascido Prematuro
4.
Biometrics ; 80(2)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38591365

RESUMO

A spatial sampling design determines where sample locations are placed in a study area so that population parameters can be estimated with relatively high precision. If the response variable has spatial trends, spatially balanced or well-spread designs give precise results for commonly used estimators. This article proposes a new method that draws well-spread samples over arbitrary auxiliary spaces and can be used for master sampling applications. All we require is a measure of the distance between population units. Numerical results show that the method generates well-spread samples and compares favorably with existing designs. We provide an example application using several auxiliary variables to estimate total aboveground biomass over a large study area in Eastern Amazonia, Brazil. Multipurpose surveys are also considered, where the totals of aboveground biomass, primary production, and clay content (3 responses) are estimated from a single well-spread sample over the auxiliary space.


Assuntos
Tamanho da Amostra , Inquéritos e Questionários
5.
BMC Med Res Methodol ; 24(1): 123, 2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38831346

RESUMO

In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research.


Assuntos
Algoritmos , Depressão , Aprendizado de Máquina , Humanos , Depressão/diagnóstico , Índice de Gravidade de Doença , Sensibilidade e Especificidade , Feminino
6.
Network ; : 1-25, 2024 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-38904211

RESUMO

Cloud computing (CC) is a future revolution in the Information technology (IT) and Communication field. Security and internet connectivity are the common major factors to slow down the proliferation of CC. Recently, a new kind of denial of service (DDoS) attacks, known as Economic Denial of Sustainability (EDoS) attack, has been emerging. Though EDoS attacks are smaller at a moment, it can be expected to develop in nearer prospective in tandem with progression in the cloud usage. Here, EfficientNet-B3-Attn-2 fused Deep Quantum Neural Network (EfficientNet-DQNN) is presented for EDoS detection. Initially, cloud is simulated and thereafter, considered input log file is fed to perform data pre-processing. Z-Score Normalization ;(ZSN) is employed to carry out pre-processing of data. Afterwards, feature fusion (FF) is accomplished based on Deep Neural Network (DNN) with Kulczynski similarity. Then, data augmentation (DA) is executed by oversampling based upon Synthetic Minority Over-sampling Technique (SMOTE). At last, attack detection is conducted utilizing EfficientNet-DQNN. Furthermore, EfficientNet-DQNN is formed by incorporation of EfficientNet-B3-Attn-2 with DQNN. In addition, EfficientNet-DQNN attained 89.8% of F1-score, 90.4% of accuracy, 91.1% of precision and 91.2% of recall using BOT-IOT dataset at K-Fold is 9.

7.
Chem Pharm Bull (Tokyo) ; 72(2): 173-178, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38296560

RESUMO

Histone deacetylase 8 (HDAC8) is a zinc-dependent HDAC that catalyzes the deacetylation of nonhistone proteins. It is involved in cancer development and HDAC8 inhibitors are promising candidates as anticancer agents. However, most reported HDAC8 inhibitors contain a hydroxamic acid moiety, which often causes mutagenicity. Therefore, we used machine learning for drug screening and attempted to identify non-hydroxamic acids as HDAC8 inhibitors. In this study, we established a prediction model based on the random forest (RF) algorithm for screening HDAC8 inhibitors because it exhibited the best predictive accuracy in the training dataset, including data generated by the synthetic minority over-sampling technique (SMOTE). Using the trained RF-SMOTE model, we screened the Osaka University library for compounds and selected 50 virtual hits. However, the 50 hits in the first screening did not show HDAC8-inhibitory activity. In the second screening, using the RF-SMOTE model, which was established by retraining the dataset including 50 inactive compounds, we identified non-hydroxamic acid 12 as an HDAC8 inhibitor with an IC50 of 842 nM. Interestingly, its IC50 values for HDAC1 and HDAC3-inhibitory activity were 38 and 12 µM, respectively, showing that compound 12 has high HDAC8 selectivity. Using machine learning, we expanded the chemical space for HDAC8 inhibitors and identified non-hydroxamic acid 12 as a novel HDAC8 selective inhibitor.


Assuntos
Antineoplásicos , Inibidores de Histona Desacetilases , Humanos , Inibidores de Histona Desacetilases/farmacologia , Inibidores de Histona Desacetilases/química , Avaliação Pré-Clínica de Medicamentos , Histona Desacetilases/metabolismo , Antineoplásicos/farmacologia , Ácidos Hidroxâmicos/farmacologia , Ácidos Hidroxâmicos/química , Aprendizado de Máquina , Proteínas Repressoras
8.
Sensors (Basel) ; 24(2)2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38257412

RESUMO

In this study, we propose an augmentation method for machine learning based on relabeling data in caregiving and nursing staff indoor localization with Bluetooth Low Energy (BLE) technology. Indoor localization is used to monitor staff-to-patient assistance in caregiving and to gain insights into workload management. However, improving accuracy is challenging when there is a limited amount of data available for training. In this paper, we propose a data augmentation method to reuse the Received Signal Strength (RSS) from different beacons by relabeling to the locations with less samples, resolving data imbalance. Standard deviation and Kullback-Leibler divergence between minority and majority classes are used to measure signal pattern to find matching beacons to relabel. By matching beacons between classes, two variations of relabeling are implemented, specifically full and partial matching. The performance is evaluated using the real-world dataset we collected for five days in a nursing care facility installed with 25 BLE beacons. A Random Forest model is utilized for location recognition, and performance is compared using the weighted F1-score to account for class imbalance. By increasing the beacon data with our proposed relabeling method for data augmentation, we achieve a higher minority class F1-score compared to augmentation with Random Sampling, Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN). Our proposed method utilizes collected beacon data by leveraging majority class samples. Full matching demonstrated a 6 to 8% improvement from the original baseline overall weighted F1-score.


Assuntos
Aprendizado de Máquina , Reconhecimento Psicológico , Humanos , Coleta de Dados , Projetos de Pesquisa , Tecnologia
9.
Sensors (Basel) ; 24(3)2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38339452

RESUMO

Advancements in sensing technology have expanded the capabilities of both wearable devices and smartphones, which are now commonly equipped with inertial sensors such as accelerometers and gyroscopes. Initially, these sensors were used for device feature advancement, but now, they can be used for a variety of applications. Human activity recognition (HAR) is an interesting research area that can be used for many applications like health monitoring, sports, fitness, medical purposes, etc. In this research, we designed an advanced system that recognizes different human locomotion and localization activities. The data were collected from raw sensors that contain noise. In the first step, we detail our noise removal process, which employs a Chebyshev type 1 filter to clean the raw sensor data, and then the signal is segmented by utilizing Hamming windows. After that, features were extracted for different sensors. To select the best feature for the system, the recursive feature elimination method was used. We then used SMOTE data augmentation techniques to solve the imbalanced nature of the Extrasensory dataset. Finally, the augmented and balanced data were sent to a long short-term memory (LSTM) deep learning classifier for classification. The datasets used in this research were Real-World Har, Real-Life Har, and Extrasensory. The presented system achieved 89% for Real-Life Har, 85% for Real-World Har, and 95% for the Extrasensory dataset. The proposed system outperforms the available state-of-the-art methods.


Assuntos
Exercício Físico , Dispositivos Eletrônicos Vestíveis , Humanos , Locomoção , Atividades Humanas , Reconhecimento Psicológico
10.
Sensors (Basel) ; 24(11)2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38894404

RESUMO

The interpretability of gait analysis studies in people with rare diseases, such as those with primary hereditary cerebellar ataxia (pwCA), is frequently limited by the small sample sizes and unbalanced datasets. The purpose of this study was to assess the effectiveness of data balancing and generative artificial intelligence (AI) algorithms in generating synthetic data reflecting the actual gait abnormalities of pwCA. Gait data of 30 pwCA (age: 51.6 ± 12.2 years; 13 females, 17 males) and 100 healthy subjects (age: 57.1 ± 10.4; 60 females, 40 males) were collected at the lumbar level with an inertial measurement unit. Subsampling, oversampling, synthetic minority oversampling, generative adversarial networks, and conditional tabular generative adversarial networks (ctGAN) were applied to generate datasets to be input to a random forest classifier. Consistency and explainability metrics were also calculated to assess the coherence of the generated dataset with known gait abnormalities of pwCA. ctGAN significantly improved the classification performance compared with the original dataset and traditional data augmentation methods. ctGAN are effective methods for balancing tabular datasets from populations with rare diseases, owing to their ability to improve diagnostic models with consistent explainability.


Assuntos
Algoritmos , Inteligência Artificial , Ataxia Cerebelar , Marcha , Doenças Raras , Humanos , Feminino , Masculino , Pessoa de Meia-Idade , Marcha/fisiologia , Ataxia Cerebelar/genética , Ataxia Cerebelar/fisiopatologia , Ataxia Cerebelar/diagnóstico , Adulto , Análise da Marcha/métodos , Idoso
11.
BMC Bioinformatics ; 24(1): 129, 2023 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-37016308

RESUMO

BACKGROUND: Identification of hot spots in protein-DNA binding interfaces is extremely important for understanding the underlying mechanisms of protein-DNA interactions and drug design. Since experimental methods for identifying hot spots are time-consuming and expensive, and most of the existing computational methods are based on traditional protein-DNA features to predict hot spots, unable to make full use of the effective information in the features. RESULTS: In this work, a method named WTL-PDH is proposed for hot spots prediction. To deal with the unbalanced dataset, we used the Synthetic Minority Over-sampling Technique to generate minority class samples to achieve the balance of dataset. First, we extracted the solvent accessible surface area features and structural features, and then processed the traditional features using discrete wavelet transform and wavelet packet transform to extract the wavelet energy information and wavelet entropy information, and obtained a total of 175 dimensional features. In order to obtain the best feature subset, we systematically evaluate these features in various feature selection strategies. Finally, light gradient boosting machine (LightGBM) was used to establish the model. CONCLUSIONS: Our method achieved good results on independent test set with AUC, MCC and F1 scores of 0.838, 0.533 and 0.750, respectively. WTL-PDH can achieve generally better performance in predicting hot spots when compared with state-of-the-art methods. The dataset and source code are available at https://github.com/chase2555/WTL-PDH .


Assuntos
Software , Análise de Ondaletas , Modelos Moleculares , Bases de Dados de Proteínas , Ligação Proteica , Algoritmos
12.
J Med Internet Res ; 25: e43734, 2023 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-36749620

RESUMO

BACKGROUND: Machine learning offers new solutions for predicting life-threatening, unpredictable amiodarone-induced thyroid dysfunction. Traditional regression approaches for adverse-effect prediction without time-series consideration of features have yielded suboptimal predictions. Machine learning algorithms with multiple data sets at different time points may generate better performance in predicting adverse effects. OBJECTIVE: We aimed to develop and validate machine learning models for forecasting individualized amiodarone-induced thyroid dysfunction risk and to optimize a machine learning-based risk stratification scheme with a resampling method and readjustment of the clinically derived decision thresholds. METHODS: This study developed machine learning models using multicenter, delinked electronic health records. It included patients receiving amiodarone from January 2013 to December 2017. The training set was composed of data from Taipei Medical University Hospital and Wan Fang Hospital, while data from Taipei Medical University Shuang Ho Hospital were used as the external test set. The study collected stationary features at baseline and dynamic features at the first, second, third, sixth, ninth, 12th, 15th, 18th, and 21st months after amiodarone initiation. We used 16 machine learning models, including extreme gradient boosting, adaptive boosting, k-nearest neighbor, and logistic regression models, along with an original resampling method and 3 other resampling methods, including oversampling with the borderline-synthesized minority oversampling technique, undersampling-edited nearest neighbor, and over- and undersampling hybrid methods. The model performance was compared based on accuracy; Precision, recall, F1-score, geometric mean, area under the curve of the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPRC). Feature importance was determined by the best model. The decision threshold was readjusted to identify the best cutoff value and a Kaplan-Meier survival analysis was performed. RESULTS: The training set contained 4075 patients from Taipei Medical University Hospital and Wan Fang Hospital, of whom 583 (14.3%) developed amiodarone-induced thyroid dysfunction, while the external test set included 2422 patients from Taipei Medical University Shuang Ho Hospital, of whom 275 (11.4%) developed amiodarone-induced thyroid dysfunction. The extreme gradient boosting oversampling machine learning model demonstrated the best predictive outcomes among all 16 models. The accuracy; Precision, recall, F1-score, G-mean, AUPRC, and AUROC were 0.923, 0.632, 0.756, 0.688, 0.845, 0.751, and 0.934, respectively. After readjusting the cutoff, the best value was 0.627, and the F1-score reached 0.699. The best threshold was able to classify 286 of 2422 patients (11.8%) as high-risk subjects, among which 275 were true-positive patients in the testing set. A shorter treatment duration; higher levels of thyroid-stimulating hormone and high-density lipoprotein cholesterol; and lower levels of free thyroxin, alkaline phosphatase, and low-density lipoprotein were the most important features. CONCLUSIONS: Machine learning models combined with resampling methods can predict amiodarone-induced thyroid dysfunction and serve as a support tool for individualized risk prediction and clinical decision support.


Assuntos
Amiodarona , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Estudos Retrospectivos , Glândula Tireoide , Hospitais Universitários , Aprendizado de Máquina
13.
Sensors (Basel) ; 23(9)2023 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-37177605

RESUMO

E-commerce has increased online credit card usage nowadays. Similarly, credit card transactions have increased for physical sales and purchases. This has increased the risk of credit card fraud (CCF) and made payment networks more vulnerable. Therefore, there is a need to develop a precise CCF detector to control such online fraud. Previously, many studies have been presented on CCF detection and gave good results and performance. However, these solutions still lack performance, and most of them have ignored the outlier problem before applying feature selection and oversampling techniques to give solutions for classification. The class imbalance problem is most prominent in available datasets of credit card transactions. Therefore, the proposed study applies preprocessing to clean the feature set at first. Then, outliers are detected and normalized using the IQR method. This outlier normalizes data fed to the Shapiro method for feature ranking and the 20 most prominent features are selected. This selected feature set is then fed to the SMOTEN oversampling method, which increases the minority class instances and equalizes the positive and negative instances. Next, this cleaned feature set is then fed to five ML classifiers, and four different splits of holdout validation are applied. There are two experiments conducted in which, firstly, the original data are fed to five ML classifiers and the holdout validation technique is used, in which the AUC reaches a maximum of 0.971. In Experiment 2, outliers are normalized, features are selected using the Shapiro method, and oversampling is performed using the SMOTEN method. This normalized and processed feature set is fed to five ML classifiers via holdout validation methods. The experimental results show a 1.00 AUC compared with state-of-the-art studies, which proves that the proposed study achieves better results using this specific framework.

14.
Sensors (Basel) ; 23(17)2023 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-37687952

RESUMO

With the rapid development of the Internet of Things (IoT), the frequency of attackers using botnets to control IoT devices in order to perform distributed denial-of-service attacks (DDoS) and other cyber attacks on the internet has significantly increased. In the actual attack process, the small percentage of attack packets in IoT leads to low accuracy of intrusion detection. Based on this problem, the paper proposes an oversampling algorithm, KG-SMOTE, based on Gaussian distribution and K-means clustering, which inserts synthetic samples through Gaussian probability distribution, extends the clustering nodes in minority class samples in the same proportion, increases the density of minority class samples, and improves the amount of minority class sample data in order to provide data support for IoT-based DDoS attack detection. Experiments show that the balanced dataset generated by this method effectively improves the intrusion detection accuracy in each category and effectively solves the data imbalance problem.

15.
Sensors (Basel) ; 23(6)2023 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-36991677

RESUMO

Blood pressure (BP) monitoring is vital in daily healthcare, especially for cardiovascular diseases. However, BP values are mainly acquired through a contact-sensing method, which is inconvenient and unfriendly for BP monitoring. This paper proposes an efficient end-to-end network for estimating BP values from a facial video to achieve remote BP estimation in daily life. The network first derives a spatiotemporal map of a facial video. Then, it regresses the BP ranges with a designed blood pressure classifier and simultaneously calculates the specific value with a blood pressure calculator in each BP range based on the spatiotemporal map. In addition, an innovative oversampling training strategy was developed to handle the problem of unbalanced data distribution. Finally, we trained the proposed blood pressure estimation network on a private dataset, MPM-BP, and tested it on a popular public dataset, MMSE-HR. As a result, the proposed network achieved a mean absolute error (MAE) and root mean square error (RMSE) of 12.35 mmHg and 16.55 mmHg on systolic BP estimations, and those for diastolic BP were 9.54 mmHg and 12.22 mmHg, which were better than the values obtained in recent works. It can be concluded that the proposed method has excellent potential for camera-based BP monitoring in the indoor scenarios in the real world.


Assuntos
Determinação da Pressão Arterial , Face , Pressão Sanguínea/fisiologia
16.
Sensors (Basel) ; 23(7)2023 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-37050715

RESUMO

D-band (110-170 GHz) is a promising direction for the future of 6th generation mobile networks (6G) for high-speed mobile communication since it has a large available bandwidth, and it can provide a peak rate of hundreds of Gbit/s. Compared with the traditional electrical approach, photonics millimeter wave (mm-wave) generation in D-band is more practical and effectively overcomes the bottleneck of electrical devices. However, long-distance D-band wireless transmission is still limited by some key factors such as large absorption loss and nonlinear noises. Deep neural network algorithms are regarded as an important technique to model the nonlinear wireless behavior, among which the study on complex-value equalization is critical, especially in coherent detection systems. Moreover, probabilistic shaping is useful to improve the transmission capacity but also causes an imbalanced machine learning issue. In this paper, we propose a novel complex-valued neural network equalizer coupled with balanced random oversampling (ROS). Thanks to the adaptive deep learning method for probabilistic shaping-quadrature amplitude modulation (PS-QAM), we successfully realize a 135 GHz 4Gbaud PS-16QAM with a shaping entropy of 3.56 bit/symbol wireless transmission over 4.6 km. The bit error ratio (BER) of 4Gbaud PS-16QAM can be decreased to a soft-decision forward error correction (SD-FEC) with a 25% overhead of 2 × 10-2. Therefore, we can achieve a net rate of an 11.4 Gbit/s D-band radio-over-fiber (ROF) delivery over 4.6 km air free wireless distance.

17.
Sensors (Basel) ; 24(1)2023 Dec 26.
Artigo em Inglês | MEDLINE | ID: mdl-38202990

RESUMO

In the context of 6G technology, the Internet of Everything aims to create a vast network that connects both humans and devices across multiple dimensions. The integration of smart healthcare, agriculture, transportation, and homes is incredibly appealing, as it allows people to effortlessly control their environment through touch or voice commands. Consequently, with the increase in Internet connectivity, the security risk also rises. However, the future is centered on a six-fold increase in connectivity, necessitating the development of stronger security measures to handle the rapidly expanding concept of IoT-enabled metaverse connections. Various types of attacks, often orchestrated using botnets, pose a threat to the performance of IoT-enabled networks. Detecting anomalies within these networks is crucial for safeguarding applications from potentially disastrous consequences. The voting classifier is a machine learning (ML) model known for its effectiveness as it capitalizes on the strengths of individual ML models and has the potential to improve overall predictive performance. In this research, we proposed a novel classification technique based on the DRX approach that combines the advantages of the Decision tree, Random forest, and XGBoost algorithms. This ensemble voting classifier significantly enhances the accuracy and precision of network intrusion detection systems. Our experiments were conducted using the NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets. The findings of our study show that the DRX-based technique works better than the others. It achieved a higher accuracy of 99.88% on the NSL-KDD dataset, 99.93% on the UNSW-NB15 dataset, and 99.98% on the CIC-IDS2017 dataset, outperforming the other methods. Additionally, there is a notable reduction in the false positive rates to 0.003, 0.001, and 0.00012 for the NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets.

18.
Entropy (Basel) ; 25(3)2023 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-36981396

RESUMO

General target detection with deep learning has made tremendous strides in the past few years. However, small target detection sometimes is associated with insufficient sample size and difficulty in extracting complete feature information. For safety during autonomous driving, remote signs and pedestrians need to be detected from driving scenes photographed by car cameras. In the early period of a medical lesion, because of the small area of the lesion, target detection is of great significance to detect masses and tumors for accurate diagnosis and treatment. To deal with these problems, we propose a novel deep learning model, named CenterNet for small targets (ST-CenterNet). First of all, due to the lack of visual information on small targets in the dataset, we extracted less discriminative features. To overcome this shortcoming, the proposed selective small target replication algorithm (SSTRA) was used to realize increasing numbers of small targets by selectively oversampling them. In addition, the difficulty of extracting shallow semantic information for small targets results in incomplete target feature information. Consequently, we developed a target adaptation feature extraction module (TAFEM), which was used to conduct bottom-up and top-down bidirectional feature extraction by combining ResNet with the adaptive feature pyramid network (AFPN). The improved new network model, AFPN, was added to solve the problem of the original feature extraction module, which can only extract the last layer of the feature information. The experimental results demonstrate that the proposed method can accurately detect the small-scale image of distributed targets and simultaneously, at the pixel level, classify whether a subject is wearing a safety helmet. Compared with the detection effect of the original algorithm on the safety helmet wearing dataset (SHWD), we achieved mean average precision (mAP) of 89.06% and frames per second (FPS) of 28.96, an improvement of 18.08% mAP over the previous method.

19.
Int J Environ Sci Technol (Tehran) ; 20(5): 5333-5348, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35603096

RESUMO

The survival of mankind cannot be imagined without air. Consistent developments in almost all realms of modern human society affected the health of the air adversely. Daily industrial, transport, and domestic activities are stirring hazardous pollutants in our environment. Monitoring and predicting air quality have become essentially important in this era, especially in developing countries like India. In contrast to the traditional methods, the prediction technologies based on machine learning techniques are proved to be the most efficient tools to study such modern hazards. The present work investigates six years of air pollution data from 23 Indian cities for air quality analysis and prediction. The dataset is well preprocessed and key features are selected through the correlation analysis. An exploratory data analysis is exercised to develop insights into various hidden patterns in the dataset and pollutants directly affecting the air quality index are identified. A significant fall in almost all pollutants is observed in the pandemic year, 2020. The data imbalance problem is solved with a resampling technique and five machine learning models are employed to predict air quality. The results of these models are compared with the standard metrics. The Gaussian Naive Bayes model achieves the highest accuracy while the Support Vector Machine model exhibits the lowest accuracy. The performances of these models are evaluated and compared through established performance parameters. The XGBoost model performed the best among the other models and gets the highest linearity between the predicted and actual data.

20.
Stat Methods Appt ; : 1-35, 2023 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-37360255

RESUMO

A new class of sampling strategies is proposed that can be applied to population-based surveys targeting a rare trait that is unevenly spread over an area of interest. Our proposal is characterised by the ability to tailor the data collection to specific features and challenges of the survey at hand. It is based on integrating an adaptive component into a sequential selection, which aims both to intensify the detection of positive cases, upon exploiting the spatial clustering, and to provide a flexible framework to manage logistics and budget constraints. A class of estimators is also proposed to account for the selection bias, that are proved unbiased for the population mean (prevalence) as well as consistent and asymptotically Normal distributed. Unbiased variance estimation is also provided. A ready-to-implement weighting system is developed for estimation purposes. Two special strategies included in the proposed class are presented, that are based on the Poisson sampling and proved more efficient. The selection of primary sampling units is also illustrated for tuberculosis prevalence surveys, which are recommended in many countries and supported by the World Health Organisation as an emblematic example of the need for an improved sampling design. Simulation results are given in the tuberculosis application to illustrate the strengths and weaknesses of the proposed sequential adaptive sampling strategies with respect to traditional cross-sectional non-informative sampling as currently suggested by World Health Organisation guidelines.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa