Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Brief Bioinform ; 25(6)2024 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-39413798

RESUMEN

The spatial reconstruction of single-cell RNA sequencing (scRNA-seq) data into spatial transcriptomics (ST) is a rapidly evolving field that addresses the significant challenge of aligning gene expression profiles to their spatial origins within tissues. This task is complicated by the inherent batch effects and the need for precise gene expression characterization to accurately reflect spatial information. To address these challenges, we developed SELF-Former, a transformer-based framework that utilizes multi-scale structures to learn gene representations, while designing spatial correlation constraints for the reconstruction of corresponding ST data. SELF-Former excels in recovering the spatial information of ST data and effectively mitigates batch effects between scRNA-seq and ST data. A novel aspect of SELF-Former is the introduction of a gene filtration module, which significantly enhances the spatial reconstruction task by selecting genes that are crucial for accurate spatial positioning and reconstruction. The superior performance and effectiveness of SELF-Former's modules have been validated across four benchmark datasets, establishing it as a robust and effective method for spatial reconstruction tasks. SELF-Former demonstrates its capability to extract meaningful gene expression information from scRNA-seq data and accurately map it to the spatial context of real ST data. Our method represents a significant advancement in the field, offering a reliable approach for spatial reconstruction.


Asunto(s)
Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Perfilación de la Expresión Génica/métodos , Humanos , Biología Computacional/métodos , Transcriptoma , Programas Informáticos
2.
Artículo en Inglés | MEDLINE | ID: mdl-39159016

RESUMEN

Ensemble learning improves the capability of convolutional neural network (CNN)-based discriminators, whose performance is crucial to the quality of generated samples in generative adversarial network (GAN). However, this learning strategy results in a significant increase in the number of parameters along with computational overhead. Meanwhile, the suitable number of discriminators required to enhance GAN performance is still being investigated. To mitigate these issues, we propose an evidential discriminator for GAN (EviD-GAN)-code is available at https://github.com/Tohokantche/EviD-GAN-to learn both the model (epistemic) and data (aleatoric) uncertainties. Specifically, by analyzing three GAN models, the relation between the distribution of discriminator's output and the generator performance has been discovered yielding a general formulation of GAN framework. With the above analysis, the evidential discriminator learns the degree of aleatoric and epistemic uncertainties via imposing a higher order distribution constraint over the likelihood as expressed in the discriminator's output. This constraint can learn an ensemble of likelihood functions corresponding to an infinite set of discriminators. Thus, EviD-GAN aggregates knowledge through the ensemble learning of discriminator that allows the generator to benefit from an informative gradient flow at a negligible computational cost. Furthermore, inspired by the gradient direction in maximum mean discrepancy (MMD)-repulsive GAN, we design an asymmetric regularization scheme for EviD-GAN. Unlike MMD-repulsive GAN that performs at the distribution level, our regularization scheme is based on a pairwise loss function, performs at the sample level, and is characterized by an asymmetric behavior during the training of generator and discriminator. Experimental results show that the proposed evidential discriminator is cost-effective, consistently improves GAN in terms of Frechet inception distance (FID) and inception score (IS), and performs better than other competing models that use multiple discriminators.

3.
bioRxiv ; 2024 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-38645128

RESUMEN

A main limitation of bulk transcriptomic technologies is that individual measurements normally contain contributions from multiple cell populations, impeding the identification of cellular heterogeneity within diseased tissues. To extract cellular insights from existing large cohorts of bulk transcriptomic data, we present CSsingle, a novel method designed to accurately deconvolve bulk data into a predefined set of cell types using a scRNA-seq reference. Through comprehensive benchmark evaluations and analyses using diverse real data sets, we reveal the systematic bias inherent in existing methods, stemming from differences in cell size or library size. Our extensive experiments demonstrate that CSsingle exhibits superior accuracy and robustness compared to leading methods, particularly when dealing with bulk mixtures originating from cell types of markedly different cell sizes, as well as when handling bulk and single-cell reference data obtained from diverse sources. Our work provides an efficient and robust methodology for the integrated analysis of bulk and scRNA-seq data, facilitating various biological and clinical studies.

4.
Artículo en Inglés | MEDLINE | ID: mdl-37590112

RESUMEN

As one of the effective ways of ocular disease recognition, early fundus screening can help patients avoid unrecoverable blindness. Although deep learning is powerful for image-based ocular disease recognition, the performance mainly benefits from a large number of labeled data. For ocular disease, data collection and annotation in a single site usually take a lot of time. If multi-site data are obtained, there are two main issues: 1) the data privacy is easy to be leaked; 2) the domain gap among sites will influence the recognition performance. Inspired by the above, first, a Gaussian randomized mechanism is adopted in local sites, which are then engaged in a global model to preserve the data privacy of local sites and models. Second, to bridge the domain gap among different sites, a two-step domain adaptation method is introduced, which consists of a domain confusion module and a multi-expert learning strategy. Based on the above, a privacy-preserving federated learning framework with domain adaptation is constructed. In the experimental part, a multi-disease early fundus screening dataset, including a detailed ablation study and four experimental settings, is used to show the stepwise performance, which verifies the efficiency of our proposed framework.

5.
Artículo en Inglés | MEDLINE | ID: mdl-37028079

RESUMEN

In this work, we study a more realistic challenging scenario in multiview clustering (MVC), referred to as incomplete MVC (IMVC) where some instances in certain views are missing. The key to IMVC is how to adequately exploit complementary and consistency information under the incompleteness of data. However, most existing methods address the incompleteness problem at the instance level and they require sufficient information to perform data recovery. In this work, we develop a new approach to facilitate IMVC based on the graph propagation perspective. Specifically, a partial graph is used to describe the similarity of samples for incomplete views, such that the issue of missing instances can be translated into the missing entries of the partial graph. In this way, a common graph can be adaptively learned to self-guide the propagation process by exploiting the consistency information, and the propagated graph of each view is in turn used to refine the common self-guided graph in an iterative manner. Thus, the associated missing entries can be inferred through graph propagation by exploiting the consistency information across all views. On the other hand, existing approaches focus on the consistency structure only, and the complementary information has not been sufficiently exploited due to the data incompleteness issue. By contrast, under the proposed graph propagation framework, an exclusive regularization term can be naturally adopted to exploit the complementary information in our method. Extensive experiments demonstrate the effectiveness of the proposed method in comparison with state-of-the-art methods. The source code of our method is available at the https://github.com/CLiu272/TNNLS-PGP.

6.
IEEE Trans Biomed Eng ; 70(1): 307-317, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-35820001

RESUMEN

Advances of high throughput experimental methods have led to the availability of more diverse omic datasets in clinical analysis applications. Different types of omic data reveal different cellular aspects and contribute to the understanding of disease progression from these aspects. While survival prediction and subgroup identification are two important research problems in clinical analysis, their performance can be further boosted by taking advantages of multiple omics data through multi-view learning. However, these two tasks are generally studied separately, and the possibility that they could reinforce each other by collaborative learning has not been adequately considered. In light of this, we propose a View-aware Collaborative Learning (VaCoL) method to jointly boost the performance of survival prediction and subgroup identification by integration of multiple omics data. Specifically, survival analysis and affinity learning, which respectively perform survival prediction and subgroup identification, are integrated into a unified optimization framework to learn the two tasks in a collaborative way. In addition, by considering the diversity of different types of data, we make use of the log-rank test statistic to evaluate the importance of different views. As a result, the proposed approach can adaptively learn the optimal weight for each view during training. Empirical results on several real datasets show that our method is able to significantly improve the performance of survival prediction and subgroup identification. A detailed model analysis study is also provided to show the effectiveness of the proposed collaborative learning and view-weight learning approaches.


Asunto(s)
Prácticas Interdisciplinarias , Aprendizaje Automático , Aprendizaje , Análisis de Supervivencia
7.
IEEE Trans Neural Netw Learn Syst ; 33(2): 654-666, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-33079681

RESUMEN

Recently, multitask learning has been successfully applied to survival analysis problems. A critical challenge in real-world survival analysis tasks is that not all instances and tasks are equally learnable. A survival analysis model can be improved when considering the complexities of instances and tasks during the model training. To this end, we propose an asymmetric graph-guided multitask learning approach with self-paced learning for survival analysis applications. The proposed model is able to improve the learning performance by identifying the complex structure among tasks and considering the complexities of training instances and tasks during the model training. Especially, by incorporating the self-paced learning strategy and asymmetric graph-guided regularization, the proposed model is able to learn the model in a progressive way from "easy" to "hard" loss function items. In addition, together with the self-paced learning function, the asymmetric graph-guided regularization allows the related knowledge transfer from one task to another in an asymmetric way. Consequently, the knowledge acquired from those earlier learned tasks can help to solve complex tasks effectively. The experimental results on both synthetic and real-world TCGA data suggest that the proposed method is indeed useful for improving survival analysis and achieves higher prediction accuracies than the previous state-of-the-art methods.

8.
IEEE Trans Cybern ; 52(5): 3658-3668, 2022 May.
Artículo en Inglés | MEDLINE | ID: mdl-32924945

RESUMEN

Ensemble learning has many successful applications because of its effectiveness in boosting the predictive performance of classification models. In this article, we propose a semisupervised multiple choice learning (SemiMCL) approach to jointly train a network ensemble on partially labeled data. Our model mainly focuses on improving a labeled data assignment among the constituent networks and exploiting unlabeled data to capture domain-specific information, such that semisupervised classification can be effectively facilitated. Different from conventional multiple choice learning models, the constituent networks learn multiple tasks in the training process. Specifically, an auxiliary reconstruction task is included to learn domain-specific representation. For the purpose of performing implicit labeling on reliable unlabeled samples, we adopt a negative l1 -norm regularization when minimizing the conditional entropy with respect to the posterior probability distribution. Extensive experiments on multiple real-world datasets are conducted to verify the effectiveness and superiority of the proposed SemiMCL model.


Asunto(s)
Aprendizaje , Aprendizaje Automático Supervisado
9.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1193-1202, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-32750893

RESUMEN

Identifying cancer subtypes by integration of multi-omic data is beneficial to improve the understanding of disease progression, and provides more precise treatment for patients. Cancer subtypes identification is usually accomplished by clustering patients with unsupervised learning approaches. Thus, most existing integrative cancer subtyping methods are performed in an entirely unsupervised way. An integrative cancer subtyping approach can be improved to discover clinically more relevant cancer subtypes when considering the clinical survival response variables. In this study, we propose a Survival Supervised Graph Clustering (S2GC)for cancer subtyping by taking into consideration survival information. Specifically, we use a graph to represent similarity of patients, and develop a multi-omic survival analysis embedding with patient-to-patient similarity graph learning for cancer subtype identification. The multi-view (omic)survival analysis model and graph of patients are jointly learned in a unified way. The learned optimal graph can be unitized to cluster cancer subtypes directly. In the proposed model, the survival analysis model and adaptive graph learning could positively reinforce each other. Consequently, the survival time can be considered as supervised information to improve the quality of the similarity graph and explore clinically more relevant subgroups of patients. Experiments on several representative multi-omic cancer datasets demonstrate that the proposed method achieves better results than a number of state-of-the-art methods. The results also suggest that our method is able to identify biologically meaningful subgroups for different cancer types. (Our Matlab source code is available online at github: https://github.com/CLiu272/S2GC).


Asunto(s)
Algoritmos , Neoplasias , Análisis por Conglomerados , Humanos , Neoplasias/genética , Programas Informáticos , Análisis de Supervivencia
10.
IEEE Trans Neural Netw Learn Syst ; 33(1): 75-88, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33048763

RESUMEN

Graph-based methods have achieved impressive performance on semisupervised classification (SSC). Traditional graph-based methods have two main drawbacks. First, the graph is predefined before training a classifier, which does not leverage the interactions between the classifier training and similarity matrix learning. Second, when handling high-dimensional data with noisy or redundant features, the graph constructed in the original input space is actually unsuitable and may lead to poor performance. In this article, we propose an SSC method with novel graph construction (SSC-NGC), in which the similarity matrix is optimized in both label space and an additional subspace to get a better and more robust result than in original data space. Furthermore, to obtain a high-quality subspace, we learn the projection matrix of the additional subspace by preserving the local and global structure of the data. Finally, we intergrade the classifier training, the graph construction, and the subspace learning into a unified framework. With this framework, the classifier parameters, similarity matrix, and projection matrix of subspace are adaptively learned in an iterative scheme to obtain an optimal joint result. We conduct extensive comparative experiments against state-of-the-art methods over multiple real-world data sets. Experimental results demonstrate the superiority of the proposed method over other state-of-the-art algorithms.

11.
IEEE Trans Image Process ; 30: 5807-5818, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34138710

RESUMEN

Both target-specific and domain-invariant features can facilitate Open Set Domain Adaptation (OSDA). To exploit these features, we propose a Knowledge Exchange (KnowEx) model which jointly trains two complementary constituent networks: (1) a Domain-Adversarial Network (DAdvNet) learning the domain-invariant representation, through which the supervision in source domain can be exploited to infer the class information of unlabeled target data; (2) a Private Network (PrivNet) exclusive for target domain, which is beneficial for discriminating between instances from known and unknown classes. The two constituent networks exchange training experience in the learning process. Toward this end, we exploit an adversarial perturbation process against DAdvNet to regularize PrivNet. This enhances the complementarity between the two networks. At the same time, we incorporate an adaptation layer into DAdvNet to address the unreliability of the PrivNet's experience. Therefore, DAdvNet and PrivNet are able to mutually reinforce each other during training. We have conducted thorough experiments on multiple standard benchmarks to verify the effectiveness and superiority of KnowEx in OSDA.

12.
IEEE Trans Cybern ; 51(4): 2019-2031, 2021 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-31180903

RESUMEN

Healthcare question answering (HQA) system plays a vital role in encouraging patients to inquire for professional consultation. However, there are some challenging factors in learning and representing the question corpus of HQA datasets, such as high dimensionality, sparseness, noise, nonprofessional expression, etc. To address these issues, we propose an inception convolutional autoencoder model for Chinese healthcare question clustering (ICAHC). First, we select a set of kernels with different sizes using convolutional autoencoder networks to explore both the diversity and quality in the clustering ensemble. Thus, these kernels encourage to capture diverse representations. Second, we design four ensemble operators to merge representations based on whether they are independent, and input them into the encoder using different skip connections. Third, it maps features from the encoder into a lower-dimensional space, followed by clustering. We conduct comparative experiments against other clustering algorithms on a Chinese healthcare dataset. Experimental results show the effectiveness of ICAHC in discovering better clustering solutions. The results can be used in the prediction of patients' conditions and the development of an automatic HQA system.


Asunto(s)
Análisis por Conglomerados , Atención a la Salud/métodos , Diagnóstico por Computador/métodos , Redes Neurales de la Computación , Algoritmos , China , Humanos
13.
IEEE Trans Neural Netw Learn Syst ; 32(8): 3593-3607, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-32845845

RESUMEN

Semisupervised clustering methods improve performance by randomly selecting pairwise constraints, which may lead to redundancy and instability. In this context, active clustering is proposed to maximize the efficacy of annotations by effectively using pairwise constraints. However, existing methods lack an overall consideration of the querying criteria and repeatedly run semisupervised clustering to update labels. In this work, we first propose an active density peak (ADP) clustering algorithm that considers both representativeness and informativeness. Representative instances are selected to capture data patterns, while informative instances are queried to reduce the uncertainty of clustering results. Meanwhile, we design a fast-update-strategy to update labels efficiently. In addition, we propose an active clustering ensemble framework that combines local and global uncertainties to query the most ambiguous instances for better separation between the clusters. A weighted voting consensus method is introduced for better integration of clustering results. We conducted experiments by comparing our methods with state-of-the-art methods on real-world data sets. Experimental results demonstrate the effectiveness of our methods.

14.
IEEE Trans Cybern ; 50(1): 74-86, 2020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-30137022

RESUMEN

Multitask feature selection (MTFS) methods have become more important for many real world applications, especially in a high-dimensional setting. The most widely used assumption is that all tasks share the same features, and the l2,1 regularization method is usually applied. However, this assumption may not hold when the correlations among tasks are not obvious. Learning with unrelated tasks together may result in negative transfer and degrade the performance. In this paper, we present a flexible MTFS by graph-clustered feature sharing approach. To avoid the above limitation, we adopt a graph to represent the relevance among tasks instead of adopting a hard task set partition. Furthermore, we propose a graph-guided regularization approach such that the sparsity of the solution can be achieved on both the task level and the feature level, and a variant of the smooth proximal gradient method is developed to solve the corresponding optimization problem. An evaluation of the proposed method on multitask regression and multitask binary classification problem has been performed. Extensive experiments on synthetic datasets and real-world datasets demonstrate the effectiveness of the proposed approach to capture task structure.

15.
IEEE Trans Cybern ; 50(6): 2872-2885, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-30596592

RESUMEN

Clustering ensemble (CE) takes multiple clustering solutions into consideration in order to effectively improve the accuracy and robustness of the final result. To reduce redundancy as well as noise, a CE selection (CES) step is added to further enhance performance. Quality and diversity are two important metrics of CES. However, most of the CES strategies adopt heuristic selection methods or a threshold parameter setting to achieve tradeoff between quality and diversity. In this paper, we propose a transfer CES (TCES) algorithm which makes use of the relationship between quality and diversity in a source dataset, and transfers it into a target dataset based on three objective functions. Furthermore, a multiobjective self-evolutionary process is designed to optimize these three objective functions. Finally, we construct a transfer CE framework (TCE-TCES) based on TCES to obtain better clustering results. The experimental results on 12 transfer clustering tasks obtained from the 20newsgroups dataset show that TCE-TCES can find a better tradeoff between quality and diversity, as well as obtaining more desirable clustering results.

16.
IEEE Trans Neural Netw Learn Syst ; 31(4): 1387-1400, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-31265410

RESUMEN

The class imbalance problem has become a leading challenge. Although conventional imbalance learning methods are proposed to tackle this problem, they have some limitations: 1) undersampling methods suffer from losing important information and 2) cost-sensitive methods are sensitive to outliers and noise. To address these issues, we propose a hybrid optimal ensemble classifier framework that combines density-based undersampling and cost-effective methods through exploring state-of-the-art solutions using multi-objective optimization algorithm. Specifically, we first develop a density-based undersampling method to select informative samples from the original training data with probability-based data transformation, which enables to obtain multiple subsets following a balanced distribution across classes. Second, we exploit the cost-sensitive classification method to address the incompleteness of information problem via modifying weights of misclassified minority samples rather than the majority ones. Finally, we introduce a multi-objective optimization procedure and utilize connections between samples to self-modify the classification result using an ensemble classifier framework. Extensive comparative experiments conducted on real-world data sets demonstrate that our method outperforms the majority of imbalance and ensemble classification approaches.

17.
Artículo en Inglés | MEDLINE | ID: mdl-31603782

RESUMEN

In this paper, we explore how to leverage readily available unlabeled data to improve semi-supervised human detection performance. For this purpose, we specifically modify the region proposal network (RPN) for learning on a partially labeled dataset. Based on commonly observed false positive types, a verification module is developed to assess foreground human objects in the candidate regions to provide an important cue for filtering the RPN's proposals. The remaining proposals with high confidence scores are then used as pseudo annotations for re-training our detection model. To reduce the risk of error propagation in the training process, we adopt a self-paced training strategy to progressively include more pseudo annotations generated by the previous model over multiple training rounds. The resulting detector re-trained on the augmented data can be expected to have better detection performance. The effectiveness of the main components of this framework is verified through extensive experiments, and the proposed approach achieves state-of-the-art detection results on multiple scene-specific human detection benchmarks in the semi-supervised setting.

18.
Artículo en Inglés | MEDLINE | ID: mdl-31425030

RESUMEN

Using an ensemble of neural networks with consistency regularization is effective for improving performance and stability of deep learning, compared to the case of a single network. In this paper, we present a semi-supervised Deep Coupled Ensemble (DCE) model, which contributes to ensemble learning and classification landmark exploration for better locating the final decision boundaries in the learnt latent space. First, multiple complementary consistency regularizations are integrated into our DCE model to enable the ensemble members to learn from each other and themselves, such that training experience from different sources can be shared and utilized during training. Second, in view of the possibility of producing incorrect predictions on a number of difficult instances, we adopt class-wise mean feature matching to explore important unlabeled instances as classification landmarks, on which the model predictions are more reliable. Minimizing the weighted conditional entropy on unlabeled data is able to force the final decision boundaries to move away from important training data points, which facilitates semi-supervised learning. Ensemble members could eventually have similar performance due to consistency regularization, and thus only one of these members is needed during the test stage, such that the efficiency of our model is the same as the non-ensemble case. Extensive experimental results demonstrate the superiority of our proposed DCE model over existing state-of-the-art semi-supervised learning methods.

19.
Artículo en Inglés | MEDLINE | ID: mdl-29989970

RESUMEN

In gene expression data analysis, the problems of cancer classification and gene selection are closely related. Successfully selecting informative genes will significantly improve the classification performance. To identify informative genes from a large number of candidate genes, various methods have been proposed. However, the gene expression data may include some important correlation structures, and some of the genes can be divided into different groups based on their biological pathways. Many existing methods do not take into consideration the exact correlation structure within the data. Therefore, from both the knowledge discovery and biological perspectives, an ideal gene selection method should take this structural information into account. Moreover, the better generalization performance can be obtained by discovering correlation structure within data. In order to discover structure information among data and improve learning performance, we propose a structured penalized logistic regression model which simultaneously performs feature selection and model learning for gene expression data analysis. An efficient coordinate descent algorithm has been developed to optimize the model. The numerical simulation studies demonstrate that our method is able to select the highly correlated features. In addition, the results from real gene expression datasets show that the proposed method performs competitively with respect to previous approaches.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Modelos Logísticos , Aprendizaje Automático , Análisis por Conglomerados , Bases de Datos Genéticas , Humanos , Neoplasias/clasificación , Neoplasias/genética , Neoplasias/metabolismo , Transcriptoma/genética
20.
IEEE Trans Cybern ; 49(2): 366-379, 2019 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-29989979

RESUMEN

High dimensional data classification with very limited labeled training data is a challenging task in the area of data mining. In order to tackle this task, we first propose a feature selection-based semi-supervised classifier ensemble framework (FSCE) to perform high dimensional data classification. Then, we design an adaptive semi-supervised classifier ensemble framework (ASCE) to improve the performance of FSCE. When compared with FSCE, ASCE is characterized by an adaptive feature selection process, an adaptive weighting process (AWP), and an auxiliary training set generation process (ATSGP). The adaptive feature selection process generates a set of compact subspaces based on the selected attributes obtained by the feature selection algorithms, while the AWP associates each basic semi-supervised classifier in the ensemble with a weight value. The ATSGP enlarges the training set with unlabeled samples. In addition, a set of nonparametric tests are adopted to compare multiple semi-supervised classifier ensemble (SSCE)approaches over different datasets. The experiments on 20 high dimensional real-world datasets show that: 1) the two adaptive processes in ASCE are useful for improving the performance of the SSCE approach and 2) ASCE works well on high dimensional datasets with very limited labeled training data, and outperforms most state-of-the-art SSCE approaches.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...