RESUMO
Electronic Health Records (EHRs) contain various valuable medical entities and their relationships. Although the extraction of biomedical relationships has achieved good results in the mining of electronic health records and the construction of biomedical knowledge bases, there are still some problems. There may be implied complex associations between entities and relationships in overlapping triplets, and ignoring these interactions may lead to a decrease in the accuracy of entity extraction. To address this issue, a joint extraction model for medical entity relations based on a relation attention mechanism is proposed. The relation extraction module identifies candidate relationships within a sentence. The attention mechanism based on these relationships assigns weights to contextual words in the sentence that are associated with different relationships. Additionally, it extracts the subject and object entities. Under a specific relationship, entity vector representations are utilized to construct a global entity matching matrix based on Biaffine transformations. This matrix is designed to enhance the semantic dependencies and relational representations between entities, enabling triplet extraction. This allows the two subtasks of named entity recognition and relation extraction to be interrelated, fully utilizing contextual information within the sentence, and effectively addresses the issue of overlapping triplets. Experimental observations from the CMeIE Chinese medical relation extraction dataset and the Baidu2019 Chinese dataset confirm that our approach yields the superior F1 score across all cutting-edge baselines. Moreover, it offers substantial performance improvements in intricate situations involving diverse overlapping patterns, multitudes of triplets, and cross-sentence triplets.
Assuntos
Mineração de Dados , Registros Eletrônicos de Saúde , Algoritmos , China , Mineração de Dados/métodos , Processamento de Linguagem Natural , SemânticaRESUMO
Structural health monitoring for roads is an important task that supports inspection of transportation infrastructure. This paper explores deep learning techniques for crack detection in road images and proposes an automatic pixel-level semantic road crack image segmentation method based on a Swin transformer. This method employs Swin-T as the backbone network to extract feature information from crack images at various levels and utilizes the texture unit to extract the texture and edge characteristic information of cracks. The refinement attention module (RAM) and panoramic feature module (PFM) then merge these diverse features, ultimately refining the segmentation results. This method is called FetNet. We collect four public real-world datasets and conduct extensive experiments, comparing FetNet with various deep-learning methods. FetNet achieves the highest precision of 90.4%, a recall of 85.3%, an F1 score of 87.9%, and a mean intersection over union of 78.6% on the Crack500 dataset. The experimental results show that the FetNet approach surpasses other advanced models in terms of crack segmentation accuracy and exhibits excellent generalizability for use in complex scenes.
RESUMO
BACKGROUND: Molecular biology is crucial for drug discovery, protein design, and human health. Due to the vastness of the drug-like chemical space, depending on biomedical experts to manually design molecules is exceedingly expensive. Utilizing generative methods with deep learning technology offers an effective approach to streamline the search space for molecular design and save costs. This paper introduces a novel E(3)-equivariant score-based diffusion framework for 3D molecular generation via SDEs, aiming to address the constraints of unified Gaussian diffusion methods. Within the proposed framework EMDS, the complete diffusion is decomposed into separate diffusion processes for distinct components of the molecular feature space, while the modeling processes also capture the complex dependency among these components. Moreover, angle and torsion angle information is integrated into the networks to enhance the modeling of atom coordinates and utilize spatial information more effectively. RESULTS: Experiments on the widely utilized QM9 dataset demonstrate that our proposed framework significantly outperforms the state-of-the-art methods in all evaluation metrics for 3D molecular generation. Additionally, ablation experiments are conducted to highlight the contribution of key components in our framework, demonstrating the effectiveness of the proposed framework and the performance improvements of incorporating angle and torsion angle information for molecular generation. Finally, the comparative results of distribution show that our method is highly effective in generating molecules that closely resemble the actual scenario. CONCLUSION: Through the experiments and comparative results, our framework clearly outperforms previous 3D molecular generation methods, exhibiting significantly better capacity for modeling chemically realistic molecules. The excellent performance of EMDS in 3D molecular generation brings novel and encouraging opportunities for tackling challenging biomedical molecule and protein scenarios.
Assuntos
Aprendizado Profundo , Modelos Moleculares , Biologia Computacional/métodos , Algoritmos , Desenho de Fármacos , Descoberta de Drogas/métodosRESUMO
Contrastive learning (CL) has emerged as a powerful approach for self-supervised learning. However, it suffers from sampling bias, which hinders its performance. While the mainstream solutions, hard negative mining (HNM) and supervised CL (SCL), have been proposed to mitigate this critical issue, they do not effectively address graph CL (GCL). To address it, we propose graph positive sampling (GPS) and three contrastive objectives. The former is a novel learning paradigm designed to leverage the inherent properties of graphs for improved GCL models, which utilizes four complementary similarity measurements, including node centrality, topological distance, neighborhood overlapping, and semantic distance, to select positive counterparts for each node. Notably, GPS operates without relying on true labels and enables preprocessing applications. The latter aims to fuse positive samples and enhance representative selection in the semantic space. We release three node-level models with GPS and conduct extensive experiments on public datasets. The results demonstrate the superiority of GPS over state-of-the-art (SOTA) baselines and debiasing methods. In addition, the GPS has also been proven to be versatile, adaptive, and flexible.
RESUMO
Predicting the interaction affinity between drugs and target proteins is crucial for rapid and accurate drug discovery and repositioning. Therefore, more accurate prediction of DTA has become a key area of research in the field of drug discovery and drug repositioning. However, traditional experimental methods have disadvantages such as long operation cycles, high manpower requirements, and high economic costs, making it difficult to predict specific interactions between drugs and target proteins quickly and accurately. Some methods mainly use the SMILES sequence of drugs and the primary structure of proteins as inputs, ignoring the graph information such as bond encoding, degree centrality encoding, spatial encoding of drug molecule graphs, and the structural information of proteins such as secondary structure and accessible surface area. Moreover, previous methods were based on protein sequences to learn feature representations, neglecting the completeness of information. To address the completeness of drug and protein structure information, we propose a Transformer graph-based early fusion research approach for drug-target affinity prediction (GEFormerDTA). Our method reduces prediction errors caused by insufficient feature learning. Experimental results on Davis and KIBA datasets showed a better prediction of drugtarget affinity than existing affinity prediction methods.
Assuntos
Descoberta de Drogas , Reposicionamento de Medicamentos , Sequência de Aminoácidos , Fontes de Energia Elétrica , AprendizagemRESUMO
Functional molecular module (i.e., gene-miRNA co-modules and gene-miRNA-lncRNA triple-layer modules) analysis can dissect complex regulations underlying etiology or phenotypes. However, current module detection methods lack an appropriate usage and effective model of multi-omics data and cross-layer regulations of heterogeneous molecules, causing the loss of critical genetic information and corrupting the detection performance. In this study, we propose a heterogeneous network co-clustering framework (HetFCM) to detect functional co-modules. HetFCM introduces an attributed heterogeneous network to jointly model interplays and multi-type attributes of different molecules, and applies multiple variational graph autoencoders on the network to generate cross-layer association matrices, then it performs adaptive weighted co-clustering on association matrices and attribute data to identify co-modules of heterogeneous molecules. Empirical study on Human and Maize datasets reveals that HetFCM can find out co-modules characterized with denser topology and more significant functions, which are associated with human breast cancer (subtypes) and maize phenotypes (i.e., lipid storage, drought tolerance and oil content). HetFCM is a useful tool to detect co-modules and can be applied to multi-layer functional modules, yielding novel insights for analyzing molecular mechanisms. We also developed a user-friendly module detection and analysis tool and shared it at http://www.sdu-idea.cn/FMDTool.
Assuntos
Neoplasias da Mama , Análise por Conglomerados , Redes Reguladoras de Genes , Zea mays , Feminino , Humanos , Neoplasias da Mama/genética , Perfilação da Expressão Gênica/métodos , MicroRNAs/genética , Fenótipo , Zea mays/genéticaRESUMO
Finding the causal structure from a set of variables given observational data is a crucial task in many scientific areas. Most algorithms focus on discovering the global causal graph but few efforts have been made toward the local causal structure (LCS), which is of wide practical significance and easier to obtain. LCS learning faces the challenges of neighborhood determination and edge orientation. Available LCS algorithms build on conditional independence (CI) tests, they suffer the poor accuracy due to noises, various data generation mechanisms, and small-size samples of real-world applications, where CI tests do not work. In addition, they can only find the Markov equivalence class, leaving some edges undirected. In this article, we propose a GradieNt-based LCS learning approach (GraN-LCS) to determine neighbors and orient edges simultaneously in a gradient-descent way, and, thus, to explore LCS more accurately. GraN-LCS formulates the causal graph search as minimizing an acyclicity regularized score function, which can be optimized by efficient gradient-based solvers. GraN-LCS constructs a multilayer perceptron (MLP) to simultaneously fit all other variables with respect to a target variable and defines an acyclicity-constrained local recovery loss to promote the exploration of local graphs and to find out direct causes and effects of the target variable. To improve the efficacy, it applies preliminary neighborhood selection (PNS) to sketch the raw causal structure and further incorporates an l1 -norm-based feature selection on the first layer of MLP to reduce the scale of candidate variables and to pursue sparse weight matrix. GraN-LCS finally outputs LCS based on the sparse weighted adjacency matrix learned from MLPs. We conduct experiments on both synthetic and real-world datasets and verify its efficacy by comparing against state-of-the-art baselines. A detailed ablation study investigates the impact of key components of GraN-LCS and the results prove their contribution.
RESUMO
Transcription factors (TFs), transcription co-factors (TcoFs) and their target genes perform essential functions in diseases and biological processes. KnockTF 2.0 (http://www.licpathway.net/KnockTF/index.html) aims to provide comprehensive gene expression profile datasets before/after T(co)F knockdown/knockout across multiple tissue/cell types of different species. Compared with KnockTF 1.0, KnockTF 2.0 has the following improvements: (i) Newly added T(co)F knockdown/knockout datasets in mice, Arabidopsis thaliana and Zea mays and also an expanded scale of datasets in humans. Currently, KnockTF 2.0 stores 1468 manually curated RNA-seq and microarray datasets associated with 612 TFs and 172 TcoFs disrupted by different knockdown/knockout techniques, which are 2.5 times larger than those of KnockTF 1.0. (ii) Newly added (epi)genetic annotations for T(co)F target genes in humans and mice, such as super-enhancers, common SNPs, methylation sites and chromatin interactions. (iii) Newly embedded and updated search and analysis tools, including T(co)F Enrichment (GSEA), Pathway Downstream Analysis and Search by Target Gene (BLAST). KnockTF 2.0 is a comprehensive update of KnockTF 1.0, which provides more T(co)F knockdown/knockout datasets and (epi)genetic annotations across multiple species than KnockTF 1.0. KnockTF 2.0 facilitates not only the identification of functional T(co)Fs and target genes but also the investigation of their roles in the physiological and pathological processes.
Assuntos
Bases de Dados Genéticas , Fatores de Transcrição , Transcriptoma , Animais , Humanos , Camundongos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Internet , Marcação de Genes , Arabidopsis , Zea maysRESUMO
Determining cell types by single-cell transcriptomics data is fundamental for downstream analysis. However, cell clustering and data imputation still face the computation challenges, due to the high dropout rate, sparsity and dimensionality of single-cell data. Although some deep learning based solutions have been proposed to handle these challenges, they still can not leverage gene attribute information and cell topology in a sensible way to explore the consistent clustering. In this paper, we present scDeepFC, a deep information fusion-based single-cell data clustering method for cell clustering and data imputation. Specifically, scDeepFC uses a deep auto-encoder (DAE) network and a deep graph convolution network to embed high-dimensional gene attribute information and high-order cell-cell topological information into different low-dimensional representations, and then fuses them to generate a more comprehensive and accurate consensus representation via a deep information fusion network. In addition, scDeepFC integrates the zero-inflated negative binomial (ZINB) into DAE to model the dropout events. By jointly optimizing the ZINB loss and cell graph reconstruction loss, scDeepFC generates a salient embedding representation for clustering cells and imputing missing data. Extensive experiments on real single-cell datasets prove that scDeepFC outperforms other popular single-cell analysis methods. Both the gene attribute and cell topology information can improve the cell clustering.
Assuntos
Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Análise por Conglomerados , Análise de Célula Única , Análise de Sequência de RNARESUMO
Colorectal cancer (CRC) holds the distinction of being the most prevalent malignant tumor affecting the digestive system. It is a formidable global health challenge, as it ranks as the fourth leading cause of cancer-related fatalities around the world. Despite considerable advancements in comprehending and addressing colorectal cancer (CRC), the likelihood of recurring tumors and metastasis remains a major cause of high morbidity and mortality rates during treatment. Currently, colonoscopy is the predominant method for CRC screening. Artificial intelligence has emerged as a promising tool in aiding the diagnosis of polyps, which have demonstrated significant potential. Unfortunately, most segmentation methods face challenges in terms of limited accuracy and generalization to different datasets, especially the slow processing and analysis speed has become a major obstacle. In this study, we propose a fast and efficient polyp segmentation framework based on the Large-Kernel Receptive Field Block (LK-RFB) and Global Parallel Partial Decoder(GPPD). Our proposed ColonNet has been extensively tested and proven effective, achieving a DICE coefficient of over 0.910 and an FPS of over 102 on the CVC-300 dataset. In comparison to the state-of-the-art (SOTA) methods, ColonNet outperforms or achieves comparable performance on five publicly available datasets, establishing a new SOTA. Compared to state-of-the-art methods, ColonNet achieves the highest FPS (over 102 FPS) while maintaining excellent segmentation results, achieving the best or comparable performance on the five public datasets. The code will be released at: https://github.com/SPECTRELWF/ColonNet.
RESUMO
Accurate equipment operation trend prediction plays an important role in ensuring the safe operation of equipment and reducing maintenance costs. Therefore, monitoring the equipment vibration and predicting the time series of the vibration trend is one of the effective means to prevent equipment failures. In order to reduce the error of equipment operation trend prediction, this paper proposes a method for equipment operation trend prediction based on a combination of signal decomposition and an Informer prediction model. Aiming at the problem of high noise in vibration signals, which makes it difficult to obtain intrinsic characteristics when directly using raw data for prediction, the original signal is decomposed once using the variational mode decomposition (VMD) algorithm optimized by the improved sparrow search algorithm (ISSA) to obtain the intrinsic mode function (IMF) for different frequencies and calculate the fuzzy entropy. The improved adaptive white noise complete set empirical mode decomposition (ICEEMDAN) is used to decompose the components with the largest fuzzy entropy to obtain a series of intrinsic mode components, fully combining the advantages of the Informer model in processing long time series, and predict equipment operation trend data. Input all subsequences into the Informer model and reconstruct the results to obtain the predicted results. The experimental results indicate that the proposed method can effectively improve the accuracy of equipment operation trend prediction compared to other models.
Assuntos
Aprendizado Profundo , Vibração , Algoritmos , Entropia , Falha de EquipamentoRESUMO
MOTIVATION: Gene regulatory networks (GRNs) arise from the intricate interactions between transcription factors (TFs) and their target genes during the growth and development of organisms. The inference of GRNs can unveil the underlying gene interactions in living systems and facilitate the investigation of the relationship between gene expression patterns and phenotypic traits. Although several machine-learning models have been proposed for inferring GRNs from single-cell RNA sequencing (scRNA-seq) data, some of these models, such as Boolean and tree-based networks, suffer from sensitivity to noise and may encounter difficulties in handling the high noise and dimensionality of actual scRNA-seq data, as well as the sparse nature of gene regulation relationships. Thus, inferring large-scale information from GRNs remains a formidable challenge. RESULTS: This study proposes a multilevel, multi-structure framework called a pseudo-Siamese GRN (PSGRN) for inferring large-scale GRNs from time-series expression datasets. Based on the pseudo-Siamese network, we applied a gated recurrent unit to capture the time features of each TF and target matrix and learn the spatial features of the matrices after merging by applying the DenseNet framework. Finally, we applied a sigmoid function to evaluate interactions. We constructed two maize sub-datasets, including gene expression levels and GRNs, using existing open-source maize multi-omics data and compared them to other GRN inference methods, including GENIE3, GRNBoost2, nonlinear ordinary differential equations, CNNC, and DGRNS. Our results show that PSGRN outperforms state-of-the-art methods. This study proposed a new framework: a PSGRN that allows GRNs to be inferred from scRNA-seq data, elucidating the temporal and spatial features of TFs and their target genes. The results show the model's robustness and generalization, laying a theoretical foundation for maize genotype-phenotype associations with implications for breeding work.
Assuntos
Redes Reguladoras de Genes , Melhoramento Vegetal , Regulação da Expressão Gênica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , AlgoritmosRESUMO
Cooperative driver pathways discovery helps researchers to study the pathogenesis of cancer. However, most discovery methods mainly focus on genomics data, and neglect the known pathway information and other related multi-omics data; thus they cannot faithfully decipher the carcinogenic process. We propose CDPMiner (Cooperative Driver Pathways Miner) to discover cooperative driver pathways by multiplex network embedding, which can jointly model relational and attribute information of multi-type molecules. CDPMiner first uses the pathway topology to quantify the weight of genes in different pathways, and optimizes the relations between genes and pathways. Then it constructs an attributed multiplex network consisting of micro RNAs, long noncoding RNAs, genes and pathways, embeds the network through deep joint matrix factorization to mine more essential information for pathway-level analysis and reconstructs the pathway interaction network. Finally, CDPMiner leverages the reconstructed network and mutation data to define the driver weight between pathways to discover cooperative driver pathways. Experimental results on Breast invasive carcinoma and Stomach adenocarcinoma datasets show that CDPMiner can effectively fuse multi-omics data to discover more driver pathways, which indeed cooperatively trigger cancers and are valuable for carcinogenesis analysis. Ablation study justifies CDPMiner for a more comprehensive analysis of cancer by fusing multi-omics data.
Assuntos
Algoritmos , Neoplasias da Mama , Humanos , Feminino , Genômica/métodos , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Mutação , Carcinogênese/genéticaRESUMO
MOTIVATION: Predicting the associations between human microbes and drugs (MDAs) is one critical step in drug development and precision medicine areas. Since discovering these associations through wet experiments is time-consuming and labor-intensive, computational methods have already been an effective way to tackle this problem. Recently, graph contrastive learning (GCL) approaches have shown great advantages in learning the embeddings of nodes from heterogeneous biological graphs (HBGs). However, most GCL-based approaches don't fully capture the rich structure information in HBGs. Besides, fewer MDA prediction methods could screen out the most informative negative samples for effectively training the classifier. Therefore, it still needs to improve the accuracy of MDA predictions. RESULTS: In this study, we propose a novel approach that employs the Structure-enhanced Contrastive learning and Self-paced negative sampling strategy for Microbe-Drug Association predictions (SCSMDA). Firstly, SCSMDA constructs the similarity networks of microbes and drugs, as well as their different meta-path-induced networks. Then SCSMDA employs the representations of microbes and drugs learned from meta-path-induced networks to enhance their embeddings learned from the similarity networks by the contrastive learning strategy. After that, we adopt the self-paced negative sampling strategy to select the most informative negative samples to train the MLP classifier. Lastly, SCSMDA predicts the potential microbe-drug associations with the trained MLP classifier. The embeddings of microbes and drugs learning from the similarity networks are enhanced with the contrastive learning strategy, which could obtain their discriminative representations. Extensive results on three public datasets indicate that SCSMDA significantly outperforms other baseline methods on the MDA prediction task. Case studies for two common drugs could further demonstrate the effectiveness of SCSMDA in finding novel MDA associations. AVAILABILITY: The source code is publicly available on GitHub https://github.com/Yue-Yuu/SCSMDA-master.
Assuntos
Desenvolvimento de Medicamentos , Medicina de Precisão , Humanos , SoftwareRESUMO
Predicting differential gene expression (DGE) from Histone modifications (HM) signal is crucial to understand how HM controls cell functional heterogeneity through influencing differential gene regulation. Most existing prediction methods use fixed-length bins to represent HM signals and transmit these bins into a single machine learning model to predict differential expression genes of single cell type or cell type pair. However, the inappropriate bin length may cause the splitting of the important HM segment and lead to information loss. Furthermore, the bias of single learning model may limit the prediction accuracy. Considering these problems, in this paper, we proposes an Ensemble deep neural networks framework for predicting Differential Gene Expression (EnDGE). EnDGE employs different feature extractors on input HM signal data with different bin lengths and fuses the feature vectors for DGE prediction. Ensemble multiple learning models with different HM signal cutting strategies helps to keep the integrity and consistency of genetic information in each signal segment, and offset the bias of individual models. Besides the popular feature extractors, we also propose a new Residual Network based model with higher prediction accuracy to increase the diversity of feature extractors. Experiments on the real datasets from the Roadmap Epigenome Project (REMC) show that for all cell type pairs, EnDGE significantly outperforms the state-of-the-art baselines for differential gene expression prediction.
Assuntos
Código das Histonas , Processamento de Proteína Pós-Traducional , Código das Histonas/genética , Redes Neurais de Computação , Regulação da Expressão Gênica , Expressão GênicaRESUMO
MOTIVATION: CircularRNA (circRNA) is a class of noncoding RNA with high conservation and stability, which is considered as an important disease biomarker and drug target. Accumulating pieces of evidence have indicated that circRNA plays a crucial role in the pathogenesis and progression of many complex diseases. As the biological experiments are time-consuming and labor-intensive, developing an accurate computational prediction method has become indispensable to identify disease-related circRNAs. RESULTS: We presented a hybrid graph representation learning framework, named GraphCDA, for predicting the potential circRNA-disease associations. Firstly, the circRNA-circRNA similarity network and disease-disease similarity network were constructed to characterize the relationships of circRNAs and diseases, respectively. Secondly, a hybrid graph embedding model combining Graph Convolutional Networks and Graph Attention Networks was introduced to learn the feature representations of circRNAs and diseases simultaneously. Finally, the learned representations were concatenated and employed to build the prediction model for identifying the circRNA-disease associations. A series of experimental results demonstrated that GraphCDA outperformed other state-of-the-art methods on several public databases. Moreover, GraphCDA could achieve good performance when only using a small number of known circRNA-disease associations as the training set. Besides, case studies conducted on several human diseases further confirmed the prediction capability of GraphCDA for predicting potential disease-related circRNAs. In conclusion, extensive experimental results indicated that GraphCDA could serve as a reliable tool for exploring the regulatory role of circRNAs in complex diseases.
Assuntos
Biologia Computacional , RNA Circular , Biomarcadores , Biologia Computacional/métodos , Humanos , PolímerosRESUMO
MOTIVATION: High-resolution annotation of gene functions is a central task in functional genomics. Multiple proteoforms translated from alternatively spliced isoforms from a single gene are actual function performers and greatly increase the functional diversity. The specific functions of different isoforms can decipher the molecular basis of various complex diseases at a finer granularity. Multi-instance learning (MIL)-based solutions have been developed to distribute gene(bag)-level Gene Ontology (GO) annotations to isoforms(instances), but they simply presume that a particular annotation of the gene is responsible by only one isoform, neglect the hierarchical structures and semantics of massive GO terms (labels), or can only handle dozens of terms. RESULTS: We propose an efficacy approach IsofunGO to differentiate massive functions of isoforms by GO embedding. Particularly, IsofunGO first introduces an attributed hierarchical network to model massive GO terms, and a GO network embedding strategy to learn compact representations of GO terms and project GO annotations of genes into compressed ones, this strategy not only explores and preserves hierarchy between GO terms but also greatly reduces the prediction load. Next, it develops an attention-based MIL network to fuse genomics and transcriptomics data of isoforms and predict isoform functions by referring to compressed annotations. Extensive experiments on benchmark datasets demonstrate the efficacy of IsofunGO. Both the GO embedding and attention mechanism can boost the performance and interpretability. AVAILABILITYAND IMPLEMENTATION: The code of IsofunGO is available at http://www.sdu-idea.cn/codes.php?name=IsofunGO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional , Semântica , Ontologia Genética , Anotação de Sequência Molecular , Isoformas de Proteínas/genéticaRESUMO
Personalized federated learning (PFL) learns a personalized model for each client in a decentralized manner, where each client owns private data that are not shared and data among clients are non-independent and identically distributed (i.i.d.) However, existing PFL solutions assume that clients have sufficient training samples to jointly induce personalized models. Thus, existing PFL solutions cannot perform well in a few-shot scenario, where most or all clients only have a handful of samples for training. Furthermore, existing few-shot learning (FSL) approaches typically need centralized training data; as such, these FSL methods are not applicable in decentralized scenarios. How to enable PFL with limited training samples per client is a practical but understudied problem. In this article, we propose a solution called personalized federated few-shot learning (pFedFSL) to tackle this problem. Specifically, pFedFSL learns a personalized and discriminative feature space for each client by identifying which models perform well on which clients, without exposing local data of clients to the server and other clients, and which clients should be selected for collaboration with the target client. In the learned feature spaces, each sample is made closer to samples of the same category and farther away from samples of different categories. Experimental results on four benchmark datasets demonstrate that pFedFSL outperforms competitive baselines across different settings.
RESUMO
Different cancer types not only have common characteristics but also have their own characteristics respectively. The mechanism of these specific and common characteristics is still unclear. Pan-cancer analysis can help understand the similarities and differences among cancer types by systematically describing different patterns in cancers and identifying cancer-specific and cancer-common molecular biomarkers. While long non-coding RNAs (lncRNAs) are key cancer modulators, there is still a lack of pan-cancer analysis for lncRNA methylation dysregulation. In this study, we integrated lncRNA methylation, lncRNA expression and mRNA expression data to illuminate specific and common lncRNA methylation patterns in 23 cancer types. Then, we screened aberrantly methylated lncRNAs that negatively regulated lncRNA expression and mapped them to the ceRNA relationship for further validation. 29 lncRNAs were identified as diagnostic biomarkers for their corresponding cancer types, with lncRNA AC027601 was identified as a new KIRC-associated biomarker, and lncRNA ACTA2-AS1 was regarded as a carcinogenic factor of KIRP. Two lncRNAs HOXA-AS2 and AC007228 were identified as pan-cancer biomarkers. In general, the cancer-specific and cancer-common lncRNA biomarkers identified in this study may aid in cancer diagnosis and treatment.
RESUMO
Genome-phenome association (GPA) prediction can promote the understanding of biological mechanisms about complex pathology of phenotypes (i.e., traits and diseases). Traditional heterogeneous network-based GPA approaches overwhelmingly need to project heterogeneous data toward homogeneous network for data fusion and prediction, such projections result in the loss of heterogeneous network structure information. Matrix factorization based data fusion can avoid such projection by integrating multi-type data in a coherent way, but they typically perform linear factorization and cannot mine the nonlinear relationships between molecules, which compromise the accuracy of GPA analysis. Furthermore, most of them can not selectively synergy network topology and node attribution information in a principle way. In this paper, we propose a weighted deep matrix factorization based solution (WDGPA) to predict GPAs by selectively and differentially fusing heterogeneous molecular network and diverse attributes of nodes. WDGPA firstly assigns weights to inter/intra-relational data matrices and attribute data matrices, and performs deep matrix factorization on these matrices of heterogeneous network in a cooperative manner to obtain the nonlinear representations of different nodes. In addition, it performs low-rank representation learning on the attribute data with the shared nonlinear representations. In this way, both the network topology and node attributes are jointly mined to explore the representations of molecules and complex interplays between molecules and phenotypes. WDGPA then uses the representational vectors of gene and phenotype nodes to predict GPAs. Experimental results on maize and human datasets confirm that WDGPA outperforms competitive methods by a large margin under different evaluation protocols.