RESUMO
Exploring microbial stress responses to drugs is crucial for the advancement of new therapeutic methods. While current artificial intelligence methodologies have expedited our understanding of potential microbial responses to drugs, the models are constrained by the imprecise representation of microbes and drugs. To this end, we combine deep autoencoder and subgraph augmentation technology for the first time to propose a model called JDASA-MRD, which can identify the potential indistinguishable responses of microbes to drugs. In the JDASA-MRD model, we begin by feeding the established similarity matrices of microbe and drug into the deep autoencoder, enabling to extract robust initial features of both microbes and drugs. Subsequently, we employ the MinHash and HyperLogLog algorithms to account intersections and cardinality data between microbe and drug subgraphs, thus deeply extracting the multi-hop neighborhood information of nodes. Finally, by integrating the initial node features with subgraph topological information, we leverage graph neural network technology to predict the microbes' responses to drugs, offering a more effective solution to the 'over-smoothing' challenge. Comparative analyses on multiple public datasets confirm that the JDASA-MRD model's performance surpasses that of current state-of-the-art models. This research aims to offer a more profound insight into the adaptability of microbes to drugs and to furnish pivotal guidance for drug treatment strategies. Our data and code are publicly available at: https://github.com/ZZCrazy00/JDASA-MRD.
Assuntos
Algoritmos , Inteligência Artificial , Redes Neurais de ComputaçãoRESUMO
BACKGROUND: MicroRNA (miRNA) has been shown to play a key role in the occurrence and progression of diseases, making uncovering miRNA-disease associations vital for disease prevention and therapy. However, traditional laboratory methods for detecting these associations are slow, strenuous, expensive, and uncertain. Although numerous advanced algorithms have emerged, it is still a challenge to develop more effective methods to explore underlying miRNA-disease associations. RESULTS: In the study, we designed a novel approach on the basis of deep autoencoder and combined feature representation (DAE-CFR) to predict possible miRNA-disease associations. We began by creating integrated similarity matrices of miRNAs and diseases, performing a logistic function transformation, balancing positive and negative samples with k-means clustering, and constructing training samples. Then, deep autoencoder was used to extract low-dimensional feature from two kinds of feature representations for miRNAs and diseases, namely, original association information-based and similarity information-based. Next, we combined the resulting features for each miRNA-disease pair and used a logistic regression (LR) classifier to infer all unknown miRNA-disease interactions. Under five and tenfold cross-validation (CV) frameworks, DAE-CFR not only outperformed six popular algorithms and nine classifiers, but also demonstrated superior performance on an additional dataset. Furthermore, case studies on three diseases (myocardial infarction, hypertension and stroke) confirmed the validity of DAE-CFR in practice. CONCLUSIONS: DAE-CFR achieved outstanding performance in predicting miRNA-disease associations and can provide evidence to inform biological experiments and clinical therapy.
Assuntos
MicroRNAs , Humanos , MicroRNAs/genética , Biologia Computacional/métodos , Algoritmos , Predisposição Genética para DoençaRESUMO
MicroRNAs (miRNAs) are closely related to a variety of human diseases, not only regulating gene expression, but also having an important role in human life activities and being viable targets of small molecule drugs for disease treatment. Current computational techniques to predict the potential associations between small molecule and miRNA are not that accurate. Here, we proposed a new computational method based on a deep autoencoder and a scalable tree boosting model (DAESTB), to predict associations between small molecule and miRNA. First, we constructed a high-dimensional feature matrix by integrating small molecule-small molecule similarity, miRNA-miRNA similarity and known small molecule-miRNA associations. Second, we reduced feature dimensionality on the integrated matrix using a deep autoencoder to obtain the potential feature representation of each small molecule-miRNA pair. Finally, a scalable tree boosting model is used to predict small molecule and miRNA potential associations. The experiments on two datasets demonstrated the superiority of DAESTB over various state-of-the-art methods. DAESTB achieved the best AUC value. Furthermore, in three case studies, a large number of predicted associations by DAESTB are confirmed with the public accessed literature. We envision that DAESTB could serve as a useful biological model for predicting potential small molecule-miRNA associations.
Assuntos
MicroRNAs , Humanos , Algoritmos , Biologia Computacional/métodos , Predisposição Genética para Doença , MicroRNAs/genética , MicroRNAs/metabolismo , Modelos BiológicosRESUMO
Increasing evidences show that the occurrence of human complex diseases is closely related to microRNA (miRNA) variation and imbalance. For this reason, predicting disease-related miRNAs is essential for the diagnosis and treatment of complex human diseases. Although some current computational methods can effectively predict potential disease-related miRNAs, the accuracy of prediction should be further improved. In our study, a new computational method via deep forest ensemble learning based on autoencoder (DFELMDA) is proposed to predict miRNA-disease associations. Specifically, a new feature representation strategy is proposed to obtain different types of feature representations (from miRNA and disease) for each miRNA-disease association. Then, two types of low-dimensional feature representations are extracted by two deep autoencoders for predicting miRNA-disease associations. Finally, two prediction scores of the miRNA-disease associations are obtained by the deep random forest and combined to determine the final results. DFELMDA is compared with several classical methods on the The Human microRNA Disease Database (HMDD) dataset. Results reveal that the performance of this method is superior. The area under receiver operating characteristic curve (AUC) values obtained by DFELMDA through 5-fold and 10-fold cross-validation are 0.9552 and 0.9560, respectively. In addition, case studies on colon, breast and lung tumors of different disease types further demonstrate the excellent ability of DFELMDA to predict disease-associated miRNA-disease. Performance analysis shows that DFELMDA can be used as an effective computational tool for predicting miRNA-disease associations.
Assuntos
MicroRNAs , Algoritmos , Biologia Computacional/métodos , Predisposição Genética para Doença , Humanos , Aprendizado de Máquina , MicroRNAs/genéticaRESUMO
The compression method for wellbore trajectory data is crucial for monitoring wellbore stability. However, classical methods like methods based on Huffman coding, compressed sensing, and Differential Pulse Code Modulation (DPCM) suffer from low real-time performance, low compression ratios, and large errors between the reconstructed data and the source data. To address these issues, a new compression method is proposed, leveraging a deep autoencoder for the first time to significantly improve the compression ratio. Additionally, the method reduces error by compressing and transmitting residual data from the feature extraction process using quantization coding and Huffman coding. Furthermore, a mean filter based on the optimal standard deviation threshold is applied to further minimize error. Experimental results show that the proposed method achieves an average compression ratio of 4.05 for inclination and azimuth data; compared to the DPCM method, it is improved by 118.54%. Meanwhile, the average mean square error of the proposed method is 76.88, which is decreased by 82.46% when compared to the DPCM method. Ablation studies confirm the effectiveness of the proposed improvements. These findings highlight the efficacy of the proposed method in enhancing wellbore stability monitoring performance.
RESUMO
BACKGROUND: Fraud, Waste, and Abuse (FWA) in medical claims have a negative impact on the quality and cost of healthcare. A major component of FWA in claims is procedure code overutilization, where one or more prescribed procedures may not be relevant to a given diagnosis and patient profile, resulting in unnecessary and unwarranted treatments and medical payments. This study aims to identify such unwarranted procedures from millions of healthcare claims. In the absence of labeled examples of unwarranted procedures, the study focused on the application of unsupervised machine learning techniques. METHODS: Experiments were conducted with deep autoencoders to find claims containing anomalous procedure codes indicative of FWA, and were compared against a baseline density-based clustering model. Diagnoses, procedures, and demographic data associated with healthcare claims were used as features for the models. A dataset of one hundred thousand claims sampled from a larger claims database is used to initially train and tune the models, followed by experimentations on a dataset with thirty-three million claims. Experimental results show that the autoencoder model, when trained with a novel feature-weighted loss function, outperforms the density-based clustering approach in finding potential outlier procedure codes. RESULTS: Given the unsupervised nature of our experiments, model performance was evaluated using a synthetic outlier test dataset, and a manually annotated outlier test dataset. Precision, recall and F1-scores on the synthetic outlier test dataset for the autoencoder model trained on one hundred thousand claims were 0.87, 1.0 and 0.93, respectively, while the results for these metrics on the manually annotated outlier test dataset were 0.36, 0.86 and 0.51, respectively. The model performance on the manually annotated outlier test dataset improved further when trained on the larger thirty-three million claims dataset with precision, recall and F1-scores of 0.48, 0.90 and 0.63, respectively. CONCLUSIONS: This study demonstrates the feasibility of leveraging unsupervised, deep-learning methods to identify potential procedure overutilization from healthcare claims.
Assuntos
Aprendizado Profundo , Humanos , Aprendizado de Máquina não Supervisionado , Atenção à Saúde , Bases de Dados Factuais , FraudeRESUMO
Smart grids (SGs) play a vital role in the smart city environment, which exploits digital technology, communication systems, and automation for effectively managing electricity generation, distribution, and consumption. SGs are a fundamental module of smart cities that purpose to leverage technology and data for enhancing the life quality for citizens and optimize resource consumption. The biggest challenge in dealing with SGs and smart cities is the potential for cyberattacks comprising Distributed Denial of Service (DDoS) attacks. DDoS attacks involve overwhelming a system with a huge volume of traffic, causing disruptions and potentially leading to service outages. Mitigating and detecting DDoS attacks in SGs is of great significance to ensuring their stability and reliability. Therefore, this study develops a new White Shark Equilibrium Optimizer with a Hybrid Deep-Learning-based Cybersecurity Solution (WSEO-HDLCS) technique for a Smart City Environment. The goal of the WSEO-HDLCS technique is to recognize the presence of DDoS attacks, in order to ensure cybersecurity. In the presented WSEO-HDLCS technique, the high-dimensionality data problem can be resolved by the use of WSEO-based feature selection (WSEO-FS) approach. In addition, the WSEO-HDLCS technique employs a stacked deep autoencoder (SDAE) model for DDoS attack detection. Moreover, the gravitational search algorithm (GSA) is utilized for the optimal selection of the hyperparameters related to the SDAE model. The simulation outcome of the WSEO-HDLCS system is validated on the CICIDS-2017 dataset. The widespread simulation values highlighted the promising outcome of the WSEO-HDLCS methodology over existing methods.
RESUMO
The Korean film market has been rapidly growing, and the importance of explainable artificial intelligence (XAI) in the film industry is also increasing. In this highly competitive market, where producing a movie incurs substantial costs, it is crucial for film industry professionals to make informed decisions. To assist these professionals, we propose DRECE (short for Dimension REduction, Clustering, and classification for Explainable artificial intelligence), an XAI-powered box office classification and trend analysis model that provides valuable insights and data-driven decision-making opportunities for the Korean film industry. The DRECE framework starts with transforming multi-dimensional data into two dimensions through dimensionality reduction techniques, grouping similar data points through K-means clustering, and classifying movie clusters through machine-learning models. The XAI techniques used in the model make the decision-making process transparent, providing valuable insights for film industry professionals to improve the box office performance and maximize profits. With DRECE, the Korean film market can be understood in new and exciting ways, and decision-makers can make informed decisions to achieve success.
RESUMO
BACKGROUND: Drug repositioning has caught the attention of scholars at home and abroad due to its effective reduction of the development cost and time of new drugs. However, existing drug repositioning methods that are based on computational analysis are limited by sparse data and classic fusion methods; thus, we use autoencoders and adaptive fusion methods to calculate drug repositioning. RESULTS: In this study, a drug repositioning algorithm based on a deep autoencoder and adaptive fusion was proposed to mitigate the problems of decreased precision and low-efficiency multisource data fusion caused by data sparseness. Specifically, a drug is repositioned by fusing drug-disease associations, drug target proteins, drug chemical structures and drug side effects. First, drug feature data integrated by drug target proteins and chemical structures were processed with dimension reduction via a deep autoencoder to characterize feature representations more densely and abstractly. Then, disease similarity was computed using drug-disease association data, while drug similarity was calculated with drug feature and drug-side effect data. Predictions of drug-disease associations were also calculated using a top-k neighbor method that is commonly used in predictive drug repositioning studies. Finally, a predicted matrix for drug-disease associations was acquired after fusing a wide variety of data via adaptive fusion. Based on experimental results, the proposed algorithm achieves a higher precision and recall rate than the DRCFFS, SLAMS and BADR algorithms with the same dataset. CONCLUSION: The proposed algorithm contributes to investigating the novel uses of drugs, as shown in a case study of Alzheimer's disease. Therefore, the proposed algorithm can provide an auxiliary effect for clinical trials of drug repositioning.
Assuntos
Doença de Alzheimer , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Algoritmos , Biologia Computacional , Reposicionamento de Medicamentos , HumanosRESUMO
It is of utmost importance to develop a computational method for accurate prediction of antioxidants, as they play a vital role in the prevention of several diseases caused by oxidative stress. In this correspondence, we present an effective computational methodology based on the notion of deep latent space encoding. A deep neural network classifier fused with an auto-encoder learns class labels in a pruned latent space. This strategy has eliminated the need to separately develop classifier and the feature selection model, allowing the standalone model to effectively harness discriminating feature space and perform improved predictions. A thorough analytical study has been presented alongwith the PCA/tSNE visualization and PCA-GCNR scores to show the discriminating power of the proposed method. The proposed method showed a high MCC value of 0.43 and a balanced accuracy of 76.2%, which is superior to the existing models. The model has been evaluated on an independent dataset during which it outperformed the contemporary methods by correctly identifying the novel proteins with an accuracy of 95%.
Assuntos
Antioxidantes , Biologia Computacional/métodos , Redes Neurais de Computação , Proteínas , Software , Algoritmos , Bases de Dados de Proteínas , Humanos , Fluxo de TrabalhoRESUMO
Anomaly detection is one of the crucial tasks in daily infrastructure operations as it can prevent massive damage to devices or resources, which may then lead to catastrophic outcomes. To address this challenge, we propose an automated solution to detect anomaly pattern(s) of the water levels and report the analysis and time/point(s) of abnormality. This research's motivation is the level difficulty and time-consuming managing facilities responsible for controlling water levels due to the rare occurrence of abnormal patterns. Consequently, we employed deep autoencoder, one of the types of artificial neural network architectures, to learn different patterns from the given sequences of data points and reconstruct them. Then we use the reconstructed patterns from the deep autoencoder together with a threshold to report which patterns are abnormal from the normal ones. We used a stream of time-series data collected from sensors to train the model and then evaluate it, ready for deployment as the anomaly detection system framework. We run extensive experiments on sensor data from water tanks. Our analysis shows why we conclude vanilla deep autoencoder as the most effective solution in this scenario.
Assuntos
Redes Neurais de Computação , ÁguaRESUMO
Prognostics and health management (PHM) with failure prognosis and maintenance decision-making as the core is an advanced technology to improve the safety, reliability, and operational economy of engineering systems. However, studies of failure prognosis and maintenance decision-making have been conducted separately over the past years. Key challenges remain open when the joint problem is considered. The aim of this paper is to develop an integrated strategy for dynamic predictive maintenance scheduling (DPMS) based on a deep auto-encoder and deep forest-assisted failure prognosis method. The proposed DPMS method involves a complete process from performing failure prognosis to making maintenance decisions. The first step is to extract representative features reflecting system degradation from raw sensor data by using a deep auto-encoder. Then, the features are fed into the deep forest to compute the failure probabilities in moving time horizons. Finally, an optimal maintenance-related decision is made through quickly evaluating the costs of different decisions with the failure probabilities. Verification was accomplished using NASA's open datasets of aircraft engines, and the experimental results show that the proposed DPMS method outperforms several state-of-the-art methods, which can benefit precise maintenance decisions and reduce maintenance costs.
Assuntos
Aeronaves , Florestas , Prognóstico , Reprodutibilidade dos TestesRESUMO
Fault diagnosis and classification for machines are integral to condition monitoring in the industrial sector. However, in recent times, as sensor technology and artificial intelligence have developed, data-driven fault diagnosis and classification have been more widely investigated. The data-driven approach requires good-quality features to attain good fault classification accuracy, yet domain expertise and a fair amount of labeled data are important for better features. This paper proposes a deep auto-encoder (DAE) and convolutional neural network (CNN)-based bearing fault classification model using motor current signals of an induction motor (IM). Motor current signals can be easily and non-invasively collected from the motor. However, the current signal collected from industrial sources is highly contaminated with noise; feature calculation thus becomes very challenging. The DAE is utilized for estimating the nonlinear function of the system with the normal state data, and later, the residual signal is obtained. The subsequent CNN model then successfully classified the types of faults from the residual signals. Our proposed semi-supervised approach achieved very high classification accuracy (more than 99%). The inclusion of DAE was found to not only improve the accuracy significantly but also to be potentially useful when the amount of labeled data is small. The experimental outcomes are compared with some existing works on the same dataset, and the performance of this proposed combined approach is found to be comparable with them. In terms of the classification accuracy and other evaluation parameters, the overall method can be considered as an effective approach for bearing fault classification using the motor current signal.
Assuntos
Inteligência Artificial , Redes Neurais de ComputaçãoRESUMO
Plastic scintillation detectors are widely utilized in radiation measurement because of their unique characteristics. However, they are generally used for counting applications because of the energy broadening effect and the absence of a photo peak in their spectra. To overcome their weaknesses, many studies on pseudo spectroscopy have been reported, but most of them have not been able to directly identify the energy of incident gamma rays. In this paper, we propose a method to reconstruct Compton edges in plastic gamma spectra using an artificial neural network for direct pseudo gamma spectroscopy. Spectra simulated using MCNP 6.2 software were used to generate training and validation sets. Our model was trained to reconstruct Compton edges in plastic gamma spectra. In addition, we aimed for our model to be capable of reconstructing Compton edges even for spectra having poor counting statistics by designing a dataset generation procedure. Minimum reconstructible counts for single isotopes were evaluated with metric of mean averaged percentage error as 650 for 60Co, 2000 for 137Cs, 3050 for 22Na, and 3750 for 133Ba. The performance of our model was verified using the simulated spectra measured by a PVT detector. Although our model was trained using simulation data only, it successfully reconstructed Compton edges even in measured gamma spectra with poor counting statistics.
RESUMO
A blade rub-impact fault is one of the complex and frequently appearing faults in turbines. Due to their nonlinear and nonstationary nature, complex signal analysis techniques, which are expensive in terms of computation time, are required to extract valuable fault information from the vibration signals collected from rotor systems. In this work, a novel method for diagnosing the blade rub-impact faults of different severity levels is proposed. Specifically, the deep undercomplete denoising autoencoder is first used for estimating the nonlinear function of the system under normal operating conditions. Next, the residual signals obtained as the difference between the original signals and their estimates by the autoencoder are computed. Finally, these residual signals are used as inputs to a deep neural network to determine the current state of the rotor system. The experimental results demonstrate that the amplitudes of the residual signals reflect the changes in states of the rotor system and the fault severity levels. Furthermore, these residual signals in combination with the deep neural network demonstrated promising fault identification results when applied to a complex nonlinear fault, such as a blade-rubbing fault. To test the effectiveness of the proposed nonlinear-based fault diagnosis algorithm, this technique is compared with the autoregressive with external input Laguerre proportional-integral observer that is a linear-based fault diagnosis observation technique.
RESUMO
Machine learning is becoming an increasingly popular approach for investigating spatially distributed and subtle neuroanatomical alterations in brain-based disorders. However, some machine learning models have been criticized for requiring a large number of cases in each experimental group, and for resembling a "black box" that provides little or no insight into the nature of the data. In this article, we propose an alternative conceptual and practical approach for investigating brain-based disorders which aim to overcome these limitations. We used an artificial neural network known as "deep autoencoder" to create a normative model using structural magnetic resonance imaging data from 1,113 healthy people. We then used this model to estimate total and regional neuroanatomical deviation in individual patients with schizophrenia and autism spectrum disorder using two independent data sets (n = 263). We report that the model was able to generate different values of total neuroanatomical deviation for each disease under investigation relative to their control group (p < .005). Furthermore, the model revealed distinct patterns of neuroanatomical deviations for the two diseases, consistent with the existing neuroimaging literature. We conclude that the deep autoencoder provides a flexible and promising framework for assessing total and regional neuroanatomical deviations in neuropsychiatric populations.
Assuntos
Transtorno do Espectro Autista/diagnóstico por imagem , Encéfalo/diagnóstico por imagem , Aprendizado Profundo , Neuroimagem/métodos , Esquizofrenia/diagnóstico por imagem , Adulto , Feminino , Humanos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , MasculinoRESUMO
Traffic congestion prediction is critical for implementing intelligent transportation systems for improving the efficiency and capacity of transportation networks. However, despite its importance, traffic congestion prediction is severely less investigated compared to traffic flow prediction, which is partially due to the severe lack of large-scale high-quality traffic congestion data and advanced algorithms. This paper proposes an accessible and general workflow to acquire large-scale traffic congestion data and to create traffic congestion datasets based on image analysis. With this workflow we create a dataset named Seattle Area Traffic Congestion Status (SATCS) based on traffic congestion map snapshots from a publicly available online traffic service provider Washington State Department of Transportation. We then propose a deep autoencoder-based neural network model with symmetrical layers for the encoder and the decoder to learn temporal correlations of a transportation network and predicting traffic congestion. Our experimental results on the SATCS dataset show that the proposed DCPN model can efficiently and effectively learn temporal relationships of congestion levels of the transportation network for traffic congestion forecasting. Our method outperforms two other state-of-the-art neural network models in prediction performance, generalization capability, and computation efficiency.
RESUMO
Increased accuracy and affordability of depth sensors such as Kinect has created a great depth-data source for various 3D oriented applications. Specifically, 3D model retrieval is attracting attention in the field of computer vision and pattern recognition due to its numerous applications. A cross-domain retrieval approach such as depth image based 3D model retrieval has the challenges of occlusion, noise and view variability present in both query and training data. In this paper, we propose a new supervised deep autoencoder approach followed by semantic modeling to retrieve 3D shapes based on depth images. The key novelty is the two-fold feature abstraction to cope with the incompleteness and ambiguity present in the depth images. First, we develop a supervised autoencoder to extract robust features from both real depth images and synthetic ones rendered from 3D models, which are intended to balance reconstruction and classification capabilities of mix-domain data. Then semantic modeling of the supervised autoencoder features offers the next level of abstraction to cope with the incompleteness and ambiguity of the depth data. It is interesting that unlike any other pairwise model structures, we argue that cross-domain retrieval is still possible using only one single deep network trained on real and synthetic data. The experimental results on the NYUD2 and ModelNet10 datasets demonstrate that the proposed supervised method outperforms the recent approaches for cross-modal 3D model retrieval.
RESUMO
Determining cell types by single-cell transcriptomics data is fundamental for downstream analysis. However, cell clustering and data imputation still face the computation challenges, due to the high dropout rate, sparsity and dimensionality of single-cell data. Although some deep learning based solutions have been proposed to handle these challenges, they still can not leverage gene attribute information and cell topology in a sensible way to explore the consistent clustering. In this paper, we present scDeepFC, a deep information fusion-based single-cell data clustering method for cell clustering and data imputation. Specifically, scDeepFC uses a deep auto-encoder (DAE) network and a deep graph convolution network to embed high-dimensional gene attribute information and high-order cell-cell topological information into different low-dimensional representations, and then fuses them to generate a more comprehensive and accurate consensus representation via a deep information fusion network. In addition, scDeepFC integrates the zero-inflated negative binomial (ZINB) into DAE to model the dropout events. By jointly optimizing the ZINB loss and cell graph reconstruction loss, scDeepFC generates a salient embedding representation for clustering cells and imputing missing data. Extensive experiments on real single-cell datasets prove that scDeepFC outperforms other popular single-cell analysis methods. Both the gene attribute and cell topology information can improve the cell clustering.
Assuntos
Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Análise por Conglomerados , Análise de Célula Única , Análise de Sequência de RNARESUMO
Ultrasonic testing is widely used for defect detection in polymer composites owing to advantages such as fast processing speed, simple operation, high reliability, and real-time monitoring. However, defect information in ultrasound images is not easily detectable because of the influence of ultrasound echoes and noise. In this study, a stable three-dimensional deep convolutional autoencoder (3D-DCA) was developed to identify defects in polymer composites. Through 3D convolutional operations, it can synchronously learn the spatiotemporal properties of the data volume. Subsequently, the depth receptive field (RF) of the hidden layer in the autoencoder maps the defect information to the original depth location, thereby mitigating the effects of the defect surface and bottom echoes. In addition, a dual-layer encoder was designed to improve the hidden layer visualization results. Consequently, the size, shape, and depth of the defects can be accurately determined. The feasibility of the method was demonstrated through its application to defect detection in carbon-fiber-reinforced polymers.