RESUMO
The microRNAs (miRNAs) play crucial roles in several biological processes. It is essential for a deeper insight into their functions and mechanisms by detecting their subcellular localizations. The traditional methods for determining miRNAs subcellular localizations are expensive. The computational methods are alternative ways to quickly predict miRNAs subcellular localizations. Although several computational methods have been proposed in this regard, the incomplete representations of miRNAs in these methods left the room for improvement. In this study, a novel computational method for predicting miRNA subcellular localizations, named PMiSLocMF, was developed. As lots of miRNAs have multiple subcellular localizations, this method was a multi-label classifier. Several properties of miRNA, such as miRNA sequences, miRNA functional similarity, miRNA-disease, miRNA-drug, and miRNA-mRNA associations were adopted for generating informative miRNA features. To this end, powerful algorithms [node2vec and graph attention auto-encoder (GATE)] and one newly designed scheme were adopted to process above properties, producing five feature types. All features were poured into self-attention and fully connected layers to make predictions. The cross-validation results indicated the high performance of PMiSLocMF with accuracy higher than 0.83, average area under the receiver operating characteristic curve (AUC) and area under the precision-recall curve (AUPR) exceeding 0.90 and 0.77, respectively. Such performance was better than all previous methods based on the same dataset. Further tests proved that using all feature types can improve the performance of PMiSLocMF, and GATE and self-attention layer can help enhance the performance. Finally, we deeply analyzed the influence of miRNA associations with diseases, drugs, and mRNAs on PMiSLocMF. The dataset and codes are available at https://github.com/Gu20201017/PMiSLocMF.
Assuntos
Algoritmos , Biologia Computacional , MicroRNAs , MicroRNAs/genética , MicroRNAs/metabolismo , Biologia Computacional/métodos , Humanos , Software , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Curva ROCRESUMO
Single-cell multi-omics integration enables joint analysis at the single-cell level of resolution to provide more accurate understanding of complex biological systems, while spatial multi-omics integration is benefit to the exploration of cell spatial heterogeneity to facilitate more comprehensive downstream analyses. Existing methods are mainly designed for single-cell multi-omics data with little consideration of spatial information and still have room for performance improvement. A reliable multi-omics integration method designed for both single-cell and spatially resolved data is necessary and significant. We propose a multi-omics integration method based on dual-path graph attention auto-encoder (SSGATE). It can construct the neighborhood graphs based on single-cell expression profiles or spatial coordinates, enabling it to process single-cell data and utilize spatial information from spatially resolved data. It can also perform self-supervised learning for integration through the graph attention auto-encoders from two paths. SSGATE is applied to integration of transcriptomics and proteomics, including single-cell and spatially resolved data of various tissues from different sequencing technologies. SSGATE shows better performance and stronger robustness than competitive methods and facilitates downstream analysis.
Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Biologia Computacional/métodos , Humanos , Proteômica/métodos , Algoritmos , Transcriptoma , MultiômicaRESUMO
Long noncoding RNA (lncRNA) is a kind of noncoding RNA with a length of more than 200 nucleotide units. Numerous research studies have proven that although lncRNAs cannot be directly translated into proteins, lncRNAs still play an important role in human growth processes by interacting with proteins. Since traditional biological experiments often require a lot of time and material costs to explore potential lncRNA-protein interactions (LPI), several computational models have been proposed for this task. In this study, we introduce a novel deep learning method known as combined graph auto-encoders (LPICGAE) to predict potential human LPIs. First, we apply a variational graph auto-encoder to learn the low dimensional representations from the high-dimensional features of lncRNAs and proteins. Then the graph auto-encoder is used to reconstruct the adjacency matrix for inferring potential interactions between lncRNAs and proteins. Finally, we minimize the loss of the two processes alternately to gain the final predicted interaction matrix. The result in 5-fold cross-validation experiments illustrates that our method achieves an average area under receiver operating characteristic curve of 0.974 and an average accuracy of 0.985, which is better than those of existing six state-of-the-art computational methods. We believe that LPICGAE can help researchers to gain more potential relationships between lncRNAs and proteins effectively.
Assuntos
Proteínas , RNA Longo não Codificante , Humanos , Biologia Computacional/métodos , Proteínas/genética , Proteínas/metabolismo , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Aprendizado ProfundoRESUMO
Metabolism refers to a series of orderly chemical reactions used to maintain life activities in organisms. In healthy individuals, metabolism remains within a normal range. However, specific diseases can lead to abnormalities in the levels of certain metabolites, causing them to either increase or decrease. Detecting these deviations in metabolite levels can aid in diagnosing a disease. Traditional biological experiments often rely on a lot of manpower to do repeated experiments, which is time consuming and labor intensive. To address this issue, we develop a deep learning model based on the auto-encoder and non-negative matrix factorization named as MDA-AENMF to predict the potential associations between metabolites and diseases. We integrate a variety of similarity networks and then acquire the characteristics of both metabolites and diseases through three specific modules. First, we get the disease characteristics from the five-layer auto-encoder module. Later, in the non-negative matrix factorization module, we extract both the metabolite and disease characteristics. Furthermore, the graph attention auto-encoder module helps us obtain metabolite characteristics. After obtaining the features from three modules, these characteristics are merged into a single, comprehensive feature vector for each metabolite-disease pair. Finally, we send the corresponding feature vector and label to the multi-layer perceptron for training. The experiment demonstrates our area under the receiver operating characteristic curve of 0.975 and area under the precision-recall curve of 0.973 in 5-fold cross-validation, which are superior to those of existing state-of-the-art predictive methods. Through case studies, most of the new associations obtained by MDA-AENMF have been verified, further highlighting the reliability of MDA-AENMF in predicting the potential relationships between metabolites and diseases.
Assuntos
Algoritmos , Redes Neurais de Computação , Humanos , Reprodutibilidade dos TestesRESUMO
Discovering the relationships between long non-coding RNAs (lncRNAs) and diseases is significant in the treatment, diagnosis and prevention of diseases. However, current identified lncRNA-disease associations are not enough because of the expensive and heavy workload of wet laboratory experiments. Therefore, it is greatly important to develop an efficient computational method for predicting potential lncRNA-disease associations. Previous methods showed that combining the prediction results of the lncRNA-disease associations predicted by different classification methods via Learning to Rank (LTR) algorithm can be effective for predicting potential lncRNA-disease associations. However, when the classification results are incorrect, the ranking results will inevitably be affected. We propose the GraLTR-LDA predictor based on biological knowledge graphs and ranking framework for predicting potential lncRNA-disease associations. Firstly, homogeneous graph and heterogeneous graph are constructed by integrating multi-source biological information. Then, GraLTR-LDA integrates graph auto-encoder and attention mechanism to extract embedded features from the constructed graphs. Finally, GraLTR-LDA incorporates the embedded features into the LTR via feature crossing statistical strategies to predict priority order of diseases associated with query lncRNAs. Experimental results demonstrate that GraLTR-LDA outperforms the other state-of-the-art predictors and can effectively detect potential lncRNA-disease associations. Availability and implementation: Datasets and source codes are available at http://bliulab.net/GraLTR-LDA.
Assuntos
Neoplasias , RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , Biologia Computacional/métodos , Algoritmos , SoftwareRESUMO
The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.
Assuntos
Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Humanos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por ConglomeradosRESUMO
The emergence of single-cell RNA-seq (scRNA-seq) technology makes it possible to capture their differences at the cellular level, which contributes to studying cell heterogeneity. By extracting, amplifying and sequencing the genome at the individual cell level, scRNA-seq can be used to identify unknown or rare cell types as well as genes differentially expressed in specific cell types under different conditions using clustering for downstream analysis of scRNA-seq. Many clustering algorithms have been developed with much progress. However, scRNA-seq often appears with characteristics of high dimensions, sparsity and even the case of dropout events', which make the performance of scRNA-seq data clustering unsatisfactory. To circumvent the problem, a new deep learning framework, termed variational graph attention auto-encoder (VGAAE), is constructed for scRNA-seq data clustering. In the proposed VGAAE, a multi-head attention mechanism is introduced to learn more robust low-dimensional representations for the original scRNA-seq data and then self-supervised learning is also recommended to refine the clusters, whose number can be automatically determined using Jaccard index. Experiments have been conducted on different datasets and results show that VGAAE outperforms some other state-of-the-art clustering methods.
Assuntos
Algoritmos , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados , RNA , Perfilação da Expressão Gênica/métodosRESUMO
Recent advances in spatial transcriptomics (ST) have enabled comprehensive profiling of gene expression with spatial information in the context of the tissue microenvironment. However, with the improvements in the resolution and scale of ST data, deciphering spatial domains precisely while ensuring efficiency and scalability is still challenging. Here, we develop SGCAST, an efficient auto-encoder framework to identify spatial domains. SGCAST adopts a symmetric graph convolutional auto-encoder to learn aggregated latent embeddings via integrating the gene expression similarity and the proximity of the spatial spots. This framework in SGCAST enables a mini-batch training strategy, which makes SGCAST memory-efficient and scalable to high-resolution spatial transcriptomic data with a large number of spots. SGCAST improves the overall accuracy of spatial domain identification on benchmarking data. We also validated the performance of SGCAST on ST datasets at various scales across multiple platforms. Our study illustrates the superior capacity of SGCAST on analyzing spatial transcriptomic data.
Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Benchmarking , AprendizagemRESUMO
BACKGROUND: As noncoding RNAs, circular RNAs (circRNAs) can act as microRNA (miRNA) sponges due to their abundant miRNA binding sites, allowing them to regulate gene expression and influence disease development. Accurately identifying circRNA-miRNA associations (CMAs) is helpful to understand complex disease mechanisms. Given that biological experiments are time consuming and labor intensive, alternative computational methods to predict CMAs are urgently needed. RESULTS: This study proposes a novel computational model named CMAGN, which incorporates several advanced computational methods, for predicting CMAs. First, similarity networks for circRNAs and miRNAs are constructed according to their sequences. Graph attention autoencoder is then applied to these networks to generate the first representations of circRNAs and miRNAs. The second representations of circRNAs and miRNAs are obtained from the CMA network via node2vec. The similarity networks of circRNAs and miRNAs are reconstructed on the basis of these new representations. Finally, network consistency projection is applied to the reconstructed similarity networks and the CMA network to generate a recommendation matrix. CONCLUSION: Five-fold cross-validation of CMAGN reveals that the area under ROC and PR curves exceed 0.96 on two widely used CMA datasets, outperforming several existing models. Additional tests elaborate the reasonability of the architecture of CMAGN and uncover its strengths and weaknesses.
Assuntos
Biologia Computacional , MicroRNAs , RNA Circular , RNA Circular/genética , MicroRNAs/genética , Biologia Computacional/métodos , Humanos , Redes Reguladoras de Genes/genética , AlgoritmosRESUMO
BACKGROUND: Long non-coding RNAs (lncRNAs) can prevent, diagnose, and treat a variety of complex human diseases, and it is crucial to establish a method to efficiently predict lncRNA-disease associations. RESULTS: In this paper, we propose a prediction method for the lncRNA-disease association relationship, named LDAGM, which is based on the Graph Convolutional Autoencoder and Multilayer Perceptron model. The method first extracts the functional similarity and Gaussian interaction profile kernel similarity of lncRNAs and miRNAs, as well as the semantic similarity and Gaussian interaction profile kernel similarity of diseases. It then constructs six homogeneous networks and deeply fuses them using a deep topology feature extraction method. The fused networks facilitate feature complementation and deep mining of the original association relationships, capturing the deep connections between nodes. Next, by combining the obtained deep topological features with the similarity network of lncRNA, disease, and miRNA interactions, we construct a multi-view heterogeneous network model. The Graph Convolutional Autoencoder is employed for nonlinear feature extraction. Finally, the extracted nonlinear features are combined with the deep topological features of the multi-view heterogeneous network to obtain the final feature representation of the lncRNA-disease pair. Prediction of the lncRNA-disease association relationship is performed using the Multilayer Perceptron model. To enhance the performance and stability of the Multilayer Perceptron model, we introduce a hidden layer called the aggregation layer in the Multilayer Perceptron model. Through a gate mechanism, it controls the flow of information between each hidden layer in the Multilayer Perceptron model, aiming to achieve optimal feature extraction from each hidden layer. CONCLUSIONS: Parameter analysis, ablation studies, and comparison experiments verified the effectiveness of this method, and case studies verified the accuracy of this method in predicting lncRNA-disease association relationships.
Assuntos
Redes Neurais de Computação , RNA Longo não Codificante , RNA Longo não Codificante/genética , Humanos , Biologia Computacional/métodos , MicroRNAs/genética , AlgoritmosRESUMO
How is information processed in the cerebral cortex? In most cases, recorded brain activity is averaged over many (stimulus) repetitions, which erases the fine-structure of the neural signal. However, the brain is obviously a single-trial processor. Thus, we here demonstrate that an unsupervised machine learning approach can be used to extract meaningful information from electro-physiological recordings on a single-trial basis. We use an auto-encoder network to reduce the dimensions of single local field potential (LFP) events to create interpretable clusters of different neural activity patterns. Strikingly, certain LFP shapes correspond to latency differences in different recording channels. Hence, LFP shapes can be used to determine the direction of information flux in the cerebral cortex. Furthermore, after clustering, we decoded the cluster centroids to reverse-engineer the underlying prototypical LFP event shapes. To evaluate our approach, we applied it to both extra-cellular neural recordings in rodents, and intra-cranial EEG recordings in humans. Finally, we find that single channel LFP event shapes during spontaneous activity sample from the realm of possible stimulus evoked event shapes. A finding which so far has only been demonstrated for multi-channel population coding.
Assuntos
Aprendizado Profundo , Eletroencefalografia , Humanos , Animais , Eletroencefalografia/métodos , Córtex Cerebral/fisiologia , Masculino , Aprendizado de Máquina não Supervisionado , Ratos , Adulto , FemininoRESUMO
MOTIVATION: Interaction between transcription factor (TF) and its target genes establishes the knowledge foundation for biological researches in transcriptional regulation, the number of which is, however, still limited by biological techniques. Existing computational methods relevant to the prediction of TF-target interactions are mostly proposed for predicting binding sites, rather than directly predicting the interactions. To this end, we propose here a graph attention-based autoencoder model to predict TF-target gene interactions using the information of the known TF-target gene interaction network combined with two sequential and chemical gene characters, considering that the unobserved interactions between transcription factors and target genes can be predicted by learning the pattern of the known ones. To the best of our knowledge, the proposed model is the first attempt to solve this problem by learning patterns from the known TF-target gene interaction network. RESULTS: In this paper, we formulate the prediction task of TF-target gene interactions as a link prediction problem on a complex knowledge graph and propose a deep learning model called GraphTGI, which is composed of a graph attention-based encoder and a bilinear decoder. We evaluated the prediction performance of the proposed method on a real dataset, and the experimental results show that the proposed model yields outstanding performance with an average AUC value of 0.8864 +/- 0.0057 in the 5-fold cross-validation. It is anticipated that the GraphTGI model can effectively and efficiently predict TF-target gene interactions on a large scale. AVAILABILITY: Python code and the datasets used in our studies are made available at https://github.com/YanghanWu/GraphTGI.
Assuntos
Redes Neurais de ComputaçãoRESUMO
Single-cell RNA sequencing (scRNA-seq) measures gene transcriptome at the cell level, paving the way for the identification of cell subpopulations. Although deep learning has been successfully applied to scRNA-seq data, these algorithms are criticized for the undesirable performance and interpretability of patterns because of the noises, high-dimensionality and extraordinary sparsity of scRNA-seq data. To address these issues, a novel deep learning subspace clustering algorithm (aka scGDC) for cell types in scRNA-seq data is proposed, which simultaneously learns the deep features and topological structure of cells. Specifically, scGDC extends auto-encoder by introducing a self-representation layer to extract deep features of cells, and learns affinity graph of cells, which provide a better and more comprehensive strategy to characterize structure of cell types. To address heterogeneity of scRNA-seq data, scGDC projects cells of various types onto different subspaces, where types, particularly rare cell types, are well discriminated by utilizing generative adversarial learning. Furthermore, scGDC joins deep feature extraction, structural learning and cell type discovery, where features of cells are extracted under the guidance of cell types, thereby improving performance of algorithms. A total of 15 scRNA-seq datasets from various tissues and organisms with the number of cells ranging from 56 to 63 103 are adopted to validate performance of algorithms, and experimental results demonstrate that scGDC significantly outperforms 14 state-of-the-art methods in terms of various measurements (on average 25.51% by improvement), where (rare) cell types are significantly associated with topology of affinity graph of cells. The proposed model and algorithm provide an effective strategy for the analysis of scRNA-seq data (The software is coded using python, and is freely available for academic https://github.com/xkmaxidian/scGDC).
Assuntos
RNA Citoplasmático Pequeno , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodosRESUMO
CRISPR-Cas9 screens facilitate the discovery of gene functional relationships and phenotype-specific dependencies. The Cancer Dependency Map (DepMap) is the largest compendium of whole-genome CRISPR screens aimed at identifying cancer-specific genetic dependencies across human cell lines. A mitochondria-associated bias has been previously reported to mask signals for genes involved in other functions, and thus, methods for normalizing this dominant signal to improve co-essentiality networks are of interest. In this study, we explore three unsupervised dimensionality reduction methods-autoencoders, robust, and classical principal component analyses (PCA)-for normalizing the DepMap to improve functional networks extracted from these data. We propose a novel "onion" normalization technique to combine several normalized data layers into a single network. Benchmarking analyses reveal that robust PCA combined with onion normalization outperforms existing methods for normalizing the DepMap. Our work demonstrates the value of removing low-dimensional signals from the DepMap before constructing functional gene networks and provides generalizable dimensionality reduction-based normalization tools.
Assuntos
Redes Reguladoras de Genes , Oncogenes , Humanos , Linhagem Celular Tumoral , Sistemas CRISPR-Cas/genéticaRESUMO
Over the last decade, automatic chemical design frameworks for discovering molecules with drug-like properties have significantly progressed. Among them, the variational autoencoder (VAE) is a cutting-edge approach that models the tractable latent space of the molecular space. In particular, the usage of a VAE along with a property estimator has attracted considerable interest because it enables gradient-based optimization of a given molecule. However, although successful results have been achieved experimentally, the theoretical background and prerequisites for the correct operation of this method have not yet been clarified. In view of the above, we theoretically analyze and rigorously reconstruct the entire framework. From the perspective of parameterized distribution and the information theory, we first describe how the previous model overcomes the limitations of the beta VAE in discovering molecules with the desired properties. Furthermore, we describe the prerequisites for training the above model. Next, from the log-likelihood perspective of each term, we reformulate the objectives for exploring latent space to generate drug-like molecules. The distributional constraints are defined in this study, which will break away from the invalid molecular search. We demonstrated that our model could discover a novel chemical compound for targeting BCL-2 family proteins in de novo approach. Through the theoretical analysis and practical implementation, the importance of the aforementioned prerequisites and constraints to operate the model was verified.
Assuntos
Algoritmos , Desenho de Fármacos , Humanos , Proteínas Proto-Oncogênicas c-bcl-2/antagonistas & inibidores , Proteínas Proto-Oncogênicas c-bcl-2/metabolismo , Proteínas Proto-Oncogênicas c-bcl-2/químicaRESUMO
Long non-coding RNA (lncRNA) are shown to be closely associated with cancer metastatic events (CME, e.g., cancer cell invasion, intravasation, extravasation, proliferation) that collaboratively accelerate malignant cancer spread and cause high mortality rate in patients. Clinical trials may accurately uncover the relationships between lncRNAs and CMEs; however, it is time-consuming and expensive. With the accumulation of data, there is an urgent need to find efficient ways to identify these relationships. Herein, a graph embedding representation-based predictor (VGEA-LCME) for exploring latent lncRNA-CME associations is introduced. In VGEA-LCME, a heterogeneous combined network is constructed by integrating similarity and linkage matrix that can maintain internal and external characteristics of networks, and a variational graph auto-encoder serves as a feature generator to represent arbitrary lncRNA and CME pair. The final robustness predicted result is obtained by ensemble classifier strategy via cross-validation. Experimental comparisons and literature verification show better remarkable performance of VGEA-LCME, although the similarities between CMEs are challenging to calculate. In addition, VGEA-LCME can further identify organ-specific CMEs. To the best of our knowledge, this is the first computational attempt to discover the potential relationships between lncRNAs and CMEs. It may provide support and new insight for guiding experimental research of metastatic cancers. The source code and data are available at https://github.com/zhuyuan-cug/VGAE-LCME.
Assuntos
Neoplasias , RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , Neoplasias/genética , Biologia Computacional , AlgoritmosRESUMO
Our approach includes picture preprocessing, feature extraction utilizing the SqueezeNet model, hyperparameter optimisation utilising the Equilibrium Optimizer (EO) algorithm, and classification utilising a Stacked Autoencoder (SAE) model. Each of these processes is carried out in a series of separate steps. During the image preprocessing stage, contrast limited adaptive histogram equalisations (CLAHE) is utilized to improve the contrasts, and Adaptive Bilateral Filtering (ABF) to get rid of any noise that may be present. The SqueezeNet paradigm is utilized to obtain relevant characteristics from the pictures that have been preprocessed, and the EO technique is utilized to fine-tune the hyperparameters. Finally, the SAE model categorises the diseases that affect the grape leaf. The simulation analysis of the EODTL-GLDC technique tested New Plant Diseases Datasets and the results were inspected in many prospects. The results demonstrate that this model outperforms other deep learning techniques and methods that are more often related to machine learning. Specifically, this technique was able to attain a precision of 96.31% on the testing datasets and 96.88% on the training data set that was split 80:20. These results offer more proof that the suggested strategy is successful in automating the detection and categorization of grape leaf diseases.
Assuntos
Doença da Deficiência da Carbamoil-Fosfato Sintase I , Desnutrição , Vitis , Aprendizado de Máquina , Folhas de PlantaRESUMO
Due to the massive growth in Internet of Things (IoT) devices, it is necessary to properly identify, authorize, and protect against attacks the devices connected to the particular network. In this manuscript, IoT Device Type Identification based on Variational Auto Encoder Wasserstein Generative Adversarial Network optimized with Pelican Optimization Algorithm (IoT-DTI-VAWGAN-POA) is proposed for Prolonging IoT Security. The proposed technique comprises three phases, such as data collection, feature extraction, and IoT device type detection. Initially, real network traffic dataset is gathered by distinct IoT device types, like baby monitor, security camera, etc. For feature extraction phase, the network traffic feature vector comprises packet sizes, Mean, Variance, Kurtosis derived by Adaptive and concise empirical wavelet transforms. Then, the extracting features are supplied to VAWGAN is used to identify the IoT devices as known or unknown. Then Pelican Optimization Algorithm (POA) is considered to optimize the weight factors of VAWGAN for better IoT device type identification. The proposed IoT-DTI-VAWGAN-POA method is implemented in Python and proficiency is examined under the performance metrics, like accuracy, precision, f-measure, sensitivity, Error rate, computational complexity, and RoC. It provides 33.41%, 32.01%, and 31.65% higher accuracy, and 44.78%, 43.24%, and 48.98% lower error rate compared to the existing methods.
Assuntos
Algoritmos , Segurança Computacional , Internet das Coisas , Redes Neurais de Computação , HumanosRESUMO
Light Sheet Fluorescence Microscopy (LSFM) has emerged as a valuable tool for neurobiologists, enabling the rapid and high-quality volumetric imaging of mice brains. However, inherent artifacts and distortions introduced during the imaging process necessitate careful enhancement of LSFM images for optimal 3D reconstructions. This work aims to correct images slice by slice before reconstructing 3D volumes. Our approach involves a three-step process: firstly, the implementation of a deblurring algorithm using the work of K. Becker; secondly, an automatic contrast enhancement; and thirdly, the development of a convolutional denoising auto-encoder featuring skip connections to effectively address noise introduced by contrast enhancement, particularly excelling in handling mixed Poisson-Gaussian noise. Additionally, we tackle the challenge of axial distortion in LSFM by introducing an approach based on an auto-encoder trained on bead calibration images. The proposed pipeline demonstrates a complete solution, presenting promising results that surpass existing methods in denoising LSFM images. These advancements hold potential to significantly improve the interpretation of biological data.
RESUMO
BACKGROUND: Cancer subtype classification is helpful for personalized cancer treatment. Although, some approaches have been developed to classifying caner subtype based on high dimensional gene expression data, it is difficult to obtain satisfactory classification results. Meanwhile, some cancers have been well studied and classified to some subtypes, which are adopt by most researchers. Hence, this priori knowledge is significant for further identifying new meaningful subtypes. RESULTS: In this paper, we present a combined parallel random forest and autoencoder approach for cancer subtype identification based on high dimensional gene expression data, ForestSubtype. ForestSubtype first adopts the parallel RF and the priori knowledge of cancer subtype to train a module and extract significant candidate features. Second, ForestSubtype uses a random forest as the base module and ten parallel random forests to compute each feature weight and rank them separately. Then, the intersection of the features with the larger weights output by the ten parallel random forests is taken as our subsequent candidate features. Third, ForestSubtype uses an autoencoder to condenses the selected features into a two-dimensional data. Fourth, ForestSubtype utilizes k-means++ to obtain new cancer subtype identification results. In this paper, the breast cancer gene expression data obtained from The Cancer Genome Atlas are used for training and validation, and an independent breast cancer dataset from the Molecular Taxonomy of Breast Cancer International Consortium is used for testing. Additionally, we use two other cancer datasets for validating the generalizability of ForestSubtype. ForestSubtype outperforms the other two methods in terms of the distribution of clusters, internal and external metric results. The open-source code is available at https://github.com/lffyd/ForestSubtype . CONCLUSIONS: Our work shows that the combination of high-dimensional gene expression data and parallel random forests and autoencoder, guided by a priori knowledge, can identify new subtypes more effectively than existing methods of cancer subtype classification.