Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34607358

RESUMO

The discovery of cancer subtypes has become much-researched topic in oncology. Dividing cancer patients into subtypes can provide personalized treatments for heterogeneous patients. High-throughput technologies provide multiple omics data for cancer subtyping. Integration of multi-view data is used to identify cancer subtypes in many computational methods, which obtain different subtypes for the same cancer, even using the same multi-omics data. To a certain extent, these subtypes from distinct methods are related, which may have certain guiding significance for cancer subtyping. It is a challenge to effectively utilize the valuable information of distinct subtypes to produce more accurate and reliable subtypes. A weighted ensemble sparse latent representation (subtype-WESLR) is proposed to detect cancer subtypes on heterogeneous omics data. Using a weighted ensemble strategy to fuse base clustering obtained by distinct methods as prior knowledge, subtype-WESLR projects each sample feature profile from each data type to a common latent subspace while maintaining the local structure of the original sample feature space and consistency with the weighted ensemble and optimizes the common subspace by an iterative method to identify cancer subtypes. We conduct experiments on various synthetic datasets and eight public multi-view datasets from The Cancer Genome Atlas. The results demonstrate that subtype-WESLR is better than competing methods by utilizing the integration of base clustering of exist methods for more precise subtypes.


Assuntos
Algoritmos , Neoplasias , Análise por Conglomerados , Humanos , Neoplasias/genética
2.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34607360

RESUMO

Learning node representation is a fundamental problem in biological network analysis, as compact representation features reveal complicated network structures and carry useful information for downstream tasks such as link prediction and node classification. Recently, multiple networks that profile objects from different aspects are increasingly accumulated, providing the opportunity to learn objects from multiple perspectives. However, the complex common and specific information across different networks pose challenges to node representation methods. Moreover, ubiquitous noise in networks calls for more robust representation. To deal with these problems, we present a representation learning method for multiple biological networks. First, we accommodate the noise and spurious edges in networks using denoised diffusion, providing robust connectivity structures for the subsequent representation learning. Then, we introduce a graph regularized integration model to combine refined networks and compute common representation features. By using the regularized decomposition technique, the proposed model can effectively preserve the common structural property of different networks and simultaneously accommodate their specific information, leading to a consistent representation. A simulation study shows the superiority of the proposed method on different levels of noisy networks. Three network-based inference tasks, including drug-target interaction prediction, gene function identification and fine-grained species categorization, are conducted using representation features learned from our method. Biological networks at different scales and levels of sparsity are involved. Experimental results on real-world data show that the proposed method has robust performance compared with alternatives. Overall, by eliminating noise and integrating effectively, the proposed method is able to learn useful representations from multiple biological networks.


Assuntos
Aprendizagem , Redes Neurais de Computação , Simulação por Computador , Difusão
3.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33822879

RESUMO

With diverse types of omics data widely available, many computational methods have been recently developed to integrate these heterogeneous data, providing a comprehensive understanding of diseases and biological mechanisms. But most of them hardly take noise effects into account. Data-specific patterns unique to data types also make it challenging to uncover the consistent patterns and learn a compact representation of multi-omics data. Here we present a multi-omics integration method considering these issues. We explicitly model the error term in data reconstruction and simultaneously consider noise effects and data-specific patterns. We utilize a denoised network regularization in which we build a fused network using a denoising procedure to suppress noise effects and data-specific patterns. The error term collaborates with the denoised network regularization to capture data-specific patterns. We solve the optimization problem via an inexact alternating minimization algorithm. A comparative simulation study shows the method's superiority at discovering common patterns among data types at three noise levels. Transcriptomics-and-epigenomics integration, in seven cancer cohorts from The Cancer Genome Atlas, demonstrates that the learned integrative representation extracted in an unsupervised manner can depict survival information. Specially in liver hepatocellular carcinoma, the learned integrative representation attains average Harrell's C-index of 0.78 in 10 times 3-fold cross-validation for survival prediction, which far exceeds competing methods, and we discover an aggressive subtype in liver hepatocellular carcinoma with this latent representation, which is validated by an external dataset GSE14520. We also show that DeFusion is applicable to the integration of other omics types.


Assuntos
Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/mortalidade , Epigenômica/métodos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/mortalidade , Transcriptoma , Algoritmos , Teorema de Bayes , Estudos de Coortes , Metilação de DNA/genética , Aprendizado Profundo , Humanos , MicroRNAs/genética , Prognóstico , RNA Mensageiro/genética
4.
Bioinformatics ; 38(5): 1353-1360, 2022 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-34864881

RESUMO

MOTIVATION: Drug repositioning that aims to find new indications for existing drugs has been an efficient strategy for drug discovery. In the scenario where we only have confirmed disease-drug associations as positive pairs, a negative set of disease-drug pairs is usually constructed from the unknown disease-drug pairs in previous studies, where we do not know whether drugs and diseases can be associated, to train a model for disease-drug association prediction (drug repositioning). Drugs and diseases in these negative pairs can potentially be associated, but most studies have ignored them. RESULTS: We present a method, springD2A, to capture the uncertainty in the negative pairs, and to discriminate between positive and unknown pairs because the former are more reliable. In springD2A, we introduce a spring-like penalty for the loss of negative pairs, which is strong if they are too close in a unit sphere, but mild if they are at a moderate distance. We also design a sequential sampling in which the probability of an unknown disease-drug pair sampled as negative is proportional to its score predicted as positive. Multiple models are learned during sequential sampling, and we adopt parameter- and feature-based ensemble schemes to boost performance. Experiments show springD2A is an effective tool for drug-repositioning. AVAILABILITY AND IMPLEMENTATION: A python implementation of springD2A and datasets used in this study are available at https://github.com/wangyuanhao/springD2A. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Reposicionamento de Medicamentos , Incerteza , Probabilidade , Descoberta de Drogas
5.
BMC Bioinformatics ; 17: 100, 2016 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-26911324

RESUMO

BACKGROUND: Protein complexes carry out nearly all signaling and functional processes within cells. The study of protein complexes is an effective strategy to analyze cellular functions and biological processes. With the increasing availability of proteomics data, various computational methods have recently been developed to predict protein complexes. However, different computational methods are based on their own assumptions and designed to work on different data sources, and various biological screening methods have their unique experiment conditions, and are often different in scale and noise level. Therefore, a single computational method on a specific data source is generally not able to generate comprehensive and reliable prediction results. RESULTS: In this paper, we develop a novel Two-layer INtegrative Complex Detection (TINCD) model to detect protein complexes, leveraging the information from both clustering results and raw data sources. In particular, we first integrate various clustering results to construct consensus matrices for proteins to measure their overall co-complex propensity. Second, we combine these consensus matrices with the co-complex score matrix derived from Tandem Affinity Purification/Mass Spectrometry (TAP) data and obtain an integrated co-complex similarity network via an unsupervised metric fusion method. Finally, a novel graph regularized doubly stochastic matrix decomposition model is proposed to detect overlapping protein complexes from the integrated similarity network. CONCLUSIONS: Extensive experimental results demonstrate that TINCD performs much better than 21 state-of-the-art complex detection techniques, including ensemble clustering and data integration techniques.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Proteômica/métodos , Humanos
6.
BMC Bioinformatics ; 17(1): 358, 2016 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-27612563

RESUMO

BACKGROUND: Several recent studies have used the Minimum Dominating Set (MDS) model to identify driver nodes, which provide the control of the underlying networks, in protein interaction networks. There may exist multiple MDS configurations in a given network, thus it is difficult to determine which one represents the real set of driver nodes. Because these previous studies only focus on static networks and ignore the contextual information on particular tissues, their findings could be insufficient or even be misleading. RESULTS: In this study, we develop a Collective-Influence-corrected Minimum Dominating Set (CI-MDS) model which takes into account the collective influence of proteins. By integrating molecular expression profiles and static protein interactions, 16 tissue-specific networks are established as well. We then apply the CI-MDS model to each tissue-specific network to detect MDS proteins. It generates almost the same MDSs when it is solved using different optimization algorithms. In addition, we classify MDS proteins into Tissue-Specific MDS (TS-MDS) proteins and HouseKeeping MDS (HK-MDS) proteins based on the number of tissues in which they are expressed and identified as MDS proteins. Notably, we find that TS-MDS proteins and HK-MDS proteins have significantly different topological and functional properties. HK-MDS proteins are more central in protein interaction networks, associated with more functions, evolving more slowly and subjected to a greater number of post-translational modifications than TS-MDS proteins. Unlike TS-MDS proteins, HK-MDS proteins significantly correspond to essential genes, ageing genes, virus-targeted proteins, transcription factors and protein kinases. Moreover, we find that besides HK-MDS proteins, many TS-MDS proteins are also linked to disease related genes, suggesting the tissue specificity of human diseases. Furthermore, functional enrichment analysis reveals that HK-MDS proteins carry out universally necessary biological processes and TS-MDS proteins usually involve in tissue-dependent functions. CONCLUSIONS: Our study uncovers key features of TS-MDS proteins and HK-MDS proteins, and is a step forward towards a better understanding of the controllability of human interactomes.


Assuntos
Genes Essenciais , Especificidade de Órgãos/genética , Mapas de Interação de Proteínas , Envelhecimento/genética , Algoritmos , Evolução Molecular , Ontologia Genética , Humanos , Modelos Teóricos , Neoplasias/genética , Processamento de Proteína Pós-Traducional/genética , Vírus/metabolismo
7.
BMC Bioinformatics ; 17(1): 371, 2016 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-27623844

RESUMO

BACKGROUND: Protein complexes are the key molecular entities to perform many essential biological functions. In recent years, high-throughput experimental techniques have generated a large amount of protein interaction data. As a consequence, computational analysis of such data for protein complex detection has received increased attention in the literature. However, most existing works focus on predicting protein complexes from a single type of data, either physical interaction data or co-complex interaction data. These two types of data provide compatible and complementary information, so it is necessary to integrate them to discover the underlying structures and obtain better performance in complex detection. RESULTS: In this study, we propose a novel multi-view clustering algorithm, called the Partially Shared Multi-View Clustering model (PSMVC), to carry out such an integrated analysis. Unlike traditional multi-view learning algorithms that focus on mining either consistent or complementary information embedded in the multi-view data, PSMVC can jointly explore the shared and specific information inherent in different views. In our experiments, we compare the complexes detected by PSMVC from single data source with those detected from multiple data sources. We observe that jointly analyzing multi-view data benefits the detection of protein complexes. Furthermore, extensive experiment results demonstrate that PSMVC performs much better than 16 state-of-the-art complex detection techniques, including ensemble clustering and data integration techniques. CONCLUSIONS: In this work, we demonstrate that when integrating multiple data sources, using partially shared multi-view clustering model can help to identify protein complexes which are not readily identifiable by conventional single-view-based methods and other integrative analysis methods. All the results and source codes are available on https://github.com/Oyl-CityU/PSMVC .


Assuntos
Análise por Conglomerados , Domínios e Motivos de Interação entre Proteínas/imunologia , Mapeamento de Interação de Proteínas/métodos , Armazenamento e Recuperação da Informação
8.
BMC Bioinformatics ; 17: 108, 2016 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-26921029

RESUMO

BACKGROUND: To facilitate advances in personalized medicine, it is important to detect predictive, stable and interpretable biomarkers related with different clinical characteristics. These clinical characteristics may be heterogeneous with respect to underlying interactions between genes. Usually, traditional methods just focus on detection of differentially expressed genes without taking the interactions between genes into account. Moreover, due to the typical low reproducibility of the selected biomarkers, it is difficult to give a clear biological interpretation for a specific disease. Therefore, it is necessary to design a robust biomarker identification method that can predict disease-associated interactions with high reproducibility. RESULTS: In this article, we propose a regularized logistic regression model. Different from previous methods which focus on individual genes or modules, our model takes gene pairs, which are connected in a protein-protein interaction network, into account. A line graph is constructed to represent the adjacencies between pairwise interactions. Based on this line graph, we incorporate the degree information in the model via an adaptive elastic net, which makes our model less dependent on the expression data. Experimental results on six publicly available breast cancer datasets show that our method can not only achieve competitive performance in classification, but also retain great stability in variable selection. Therefore, our model is able to identify the diagnostic and prognostic biomarkers in a more robust way. Moreover, most of the biomarkers discovered by our model have been verified in biochemical or biomedical researches. CONCLUSIONS: The proposed method shows promise in the diagnosis of disease pathogenesis with different clinical characteristics. These advances lead to more accurate and stable biomarker discovery, which can monitor the functional changes that are perturbed by diseases. Based on these predictions, researchers may be able to provide suggestions for new therapeutic approaches.


Assuntos
Biomarcadores/metabolismo , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Modelos Logísticos , Mapas de Interação de Proteínas , Feminino , Humanos , Modelos Teóricos , Medicina de Precisão , Reprodutibilidade dos Testes
9.
BMC Bioinformatics ; 16: 146, 2015 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-25947063

RESUMO

BACKGROUND: Recently, several studies have drawn attention to the determination of a minimum set of driver proteins that are important for the control of the underlying protein-protein interaction (PPI) networks. In general, the minimum dominating set (MDS) model is widely adopted. However, because the MDS model does not generate a unique MDS configuration, multiple different MDSs would be generated when using different optimization algorithms. Therefore, among these MDSs, it is difficult to find out the one that represents the true driver set of proteins. RESULTS: To address this problem, we develop a centrality-corrected minimum dominating set (CC-MDS) model which includes heterogeneity in degree and betweenness centralities of proteins. Both the MDS model and the CC-MDS model are applied on three human PPI networks. Unlike the MDS model, the CC-MDS model generates almost the same sets of driver proteins when we implement it using different optimization algorithms. The CC-MDS model targets more high-degree and high-betweenness proteins than the uncorrected counterpart. The more central position allows CC-MDS proteins to be more important in maintaining the overall network connectivity than MDS proteins. To indicate the functional significance, we find that CC-MDS proteins are involved in, on average, more protein complexes and GO annotations than MDS proteins. We also find that more essential genes, aging genes, disease-associated genes and virus-targeted genes appear in CC-MDS proteins than in MDS proteins. As for the involvement in regulatory functions, the sets of CC-MDS proteins show much stronger enrichment of transcription factors and protein kinases. The results about topological and functional significance demonstrate that the CC-MDS model can capture more driver proteins than the MDS model. CONCLUSIONS: Based on the results obtained, the CC-MDS model presents to be a powerful tool for the determination of driver proteins that can control the underlying PPI networks. The software described in this paper and the datasets used are available at https://github.com/Zhangxf-ccnu/CC-MDS .


Assuntos
Algoritmos , Redes Reguladoras de Genes , Modelos Teóricos , Mapas de Interação de Proteínas , Proteínas/metabolismo , Software , Humanos
10.
BMC Genomics ; 16: 745, 2015 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-26438428

RESUMO

BACKGROUND: The identification of protein-protein interactions contributes greatly to the understanding of functional organization within cells. With the development of affinity purification-mass spectrometry (AP-MS) techniques, several computational scoring methods have been proposed to detect protein interactions from AP-MS data. However, most of the current methods focus on the detection of co-complex interactions and do not discriminate between direct physical interactions and indirect interactions. Consequently, less is known about the precise physical wiring diagram within cells. RESULTS: In this paper, we develop a Binary Interaction Network Model (BINM) to computationally identify direct physical interactions from co-complex interactions which can be inferred from purification data using previous scoring methods. This model provides a mathematical framework for capturing topological relationships between direct physical interactions and observed co-complex interactions. It reassigns a confidence score to each observed interaction to indicate its propensity to be a direct physical interaction. Then observed interactions with high confidence scores are predicted as direct physical interactions. We run our model on two yeast co-complex interaction networks which are constructed by two different scoring methods on a same combined AP-MS data. The direct physical interactions identified by various methods are comprehensively benchmarked against different reference sets that provide both direct and indirect evidence for physical contacts. Experiment results show that our model has a competitive performance over the state-of-the-art methods. CONCLUSIONS: According to the results obtained in this study, BINM is a powerful scoring method that can solely use network topology to predict direct physical interactions from AP-MS data. This study provides us an alternative approach to explore the information inherent in AP-MS data. The software can be downloaded from https://github.com/Zhangxf-ccnu/BINM.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Algoritmos , Conjuntos de Dados como Assunto , Espectrometria de Massas , Modelos Biológicos , Ligação Proteica , Mapas de Interação de Proteínas , Reprodutibilidade dos Testes
11.
BMC Bioinformatics ; 15: 186, 2014 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-24928559

RESUMO

BACKGROUND: Identification of protein complexes can help us get a better understanding of cellular mechanism. With the increasing availability of large-scale protein-protein interaction (PPI) data, numerous computational approaches have been proposed to detect complexes from the PPI networks. However, most of the current approaches do not consider overlaps among complexes or functional annotation information of individual proteins. Therefore, they might not be able to reflect the biological reality faithfully or make full use of the available domain-specific knowledge. RESULTS: In this paper, we develop a Generative Model with Functional and Topological Properties (GMFTP) to describe the generative processes of the PPI network and the functional profile. The model provides a working mechanism for capturing the interaction structures and the functional patterns of proteins. By combining the functional and topological properties, we formulate the problem of identifying protein complexes as that of detecting a group of proteins which frequently interact with each other in the PPI network and have similar annotation patterns in the functional profile. Using the idea of link communities, our method naturally deals with overlaps among complexes. The benefits brought by the functional properties are demonstrated by real data analysis. The results evaluated using four criteria with respect to two gold standards show that GMFTP has a competitive performance over the state-of-the-art approaches. The effectiveness of detecting overlapping complexes is also demonstrated by analyzing the topological and functional features of multi- and mono-group proteins. CONCLUSIONS: Based on the results obtained in this study, GMFTP presents to be a powerful approach for the identification of overlapping protein complexes using both the PPI network and the functional profile. The software can be downloaded from http://mail.sysu.edu.cn/home/stsddq@mail.sysu.edu.cn/dai/others/GMFTP.zip.


Assuntos
Proteínas/metabolismo , Algoritmos , Humanos , Modelos Biológicos , Probabilidade , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Software
12.
BMC Bioinformatics ; 15: 335, 2014 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-25282536

RESUMO

BACKGROUND: Proteins dynamically interact with each other to perform their biological functions. The dynamic operations of protein interaction networks (PPI) are also reflected in the dynamic formations of protein complexes. Existing protein complex detection algorithms usually overlook the inherent temporal nature of protein interactions within PPI networks. Systematically analyzing the temporal protein complexes can not only improve the accuracy of protein complex detection, but also strengthen our biological knowledge on the dynamic protein assembly processes for cellular organization. RESULTS: In this study, we propose a novel computational method to predict temporal protein complexes. Particularly, we first construct a series of dynamic PPI networks by joint analysis of time-course gene expression data and protein interaction data. Then a Time Smooth Overlapping Complex Detection model (TS-OCD) has been proposed to detect temporal protein complexes from these dynamic PPI networks. TS-OCD can naturally capture the smoothness of networks between consecutive time points and detect overlapping protein complexes at each time point. Finally, a nonnegative matrix factorization based algorithm is introduced to merge those very similar temporal complexes across different time points. CONCLUSIONS: Extensive experimental results demonstrate the proposed method is very effective in detecting temporal protein complexes than the state-of-the-art complex detection techniques.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Algoritmos , Perfilação da Expressão Gênica , Proteínas/genética , Fatores de Tempo
13.
Med Image Anal ; 93: 103103, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38368752

RESUMO

Accurate prognosis prediction for nasopharyngeal carcinoma based on magnetic resonance (MR) images assists in the guidance of treatment intensity, thus reducing the risk of recurrence and death. To reduce repeated labor and sufficiently explore domain knowledge, aggregating labeled/annotated data from external sites enables us to train an intelligent model for a clinical site with unlabeled data. However, this task suffers from the challenges of incomplete multi-modal examination data fusion and image data heterogeneity among sites. This paper proposes a cross-site survival analysis method for prognosis prediction of nasopharyngeal carcinoma from domain adaptation viewpoint. Utilizing a Cox model as the basic framework, our method equips it with a cross-attention based multi-modal fusion regularization. This regularization model effectively fuses the multi-modal information from multi-parametric MR images and clinical features onto a domain-adaptive space, despite the absence of some modalities. To enhance the feature discrimination, we also extend the contrastive learning technique to censored data cases. Compared with the conventional approaches which directly deploy a trained survival model in a new site, our method achieves superior prognosis prediction performance in cross-site validation experiments. These results highlight the key role of cross-site adaptability of our method and support its value in clinical practice.


Assuntos
Aprendizagem , Neoplasias Nasofaríngeas , Humanos , Carcinoma Nasofaríngeo/diagnóstico por imagem , Prognóstico , Neoplasias Nasofaríngeas/diagnóstico por imagem
14.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4198-4213, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35830411

RESUMO

As a fundamental manner for learning and cognition, transfer learning has attracted widespread attention in recent years. Typical transfer learning tasks include unsupervised domain adaptation (UDA) and few-shot learning (FSL), which both attempt to sufficiently transfer discriminative knowledge from the training environment to the test environment to improve the model's generalization performance. Previous transfer learning methods usually ignore the potential conditional distribution shift between environments. This leads to the discriminability degradation in the test environments. Therefore, how to construct a learnable and interpretable metric to measure and then reduce the gap between conditional distributions is very important in the literature. In this article, we design the Conditional Kernel Bures (CKB) metric for characterizing conditional distribution discrepancy, and derive an empirical estimation with convergence guarantee. CKB provides a statistical and interpretable approach, under the optimal transportation framework, to understand the knowledge transfer mechanism. It is essentially an extension of optimal transportation from the marginal distributions to the conditional distributions. CKB can be used as a plug-and-play module and placed onto the loss layer in deep networks, thus, it plays the bottleneck role in representation learning. From this perspective, the new method with network architecture is abbreviated as BuresNet, and it can be used extract conditional invariant features for both UDA and FSL tasks. BuresNet can be trained in an end-to-end manner. Extensive experiment results on several benchmark datasets validate the effectiveness of BuresNet.

15.
Proteins ; 80(9): 2137-53, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22544808

RESUMO

Protein-ligand docking is widely applied to structure-based virtual screening for drug discovery. This article presents a novel docking technique, PRL-Dock, based on hydrogen bond matching and probabilistic relaxation labeling. It deals with multiple hydrogen bonds and can match many acceptors and donors simultaneously. In the matching process, the initial probability of matching an acceptor with a donor is estimated by an efficient scoring function and the compatibility coefficients are assigned according to the coexisting condition of two hydrogen bonds. After hydrogen bond matching, the geometric complementarity of the interacting donor and acceptor sites is taken into account for displacement of the ligand. It is reduced to an optimization problem to calculate the optimal translation and rotation matrixes that minimize the root mean square deviation between two sets of points, which can be solved using the Kabsch algorithm. In addition to the van der Waals interaction, the contribution of intermolecular hydrogen bonds in a complex is included in the scoring function to evaluate the docking quality. A modified Lennard-Jones 12-6 dispersion-repulsion term is used to estimate the van der Waals interaction to make the scoring function fairly "soft" so that ligands are not heavily penalized for small errors in the binding geometry. The calculation of this scoring function is very convenient. The evaluation is carried out on 278 rigid complexes and 93 flexible ones where there is at least one intermolecular hydrogen bond. The experiment results of docking accuracy and prediction of binding affinity demonstrate that the proposed method is highly effective.


Assuntos
Algoritmos , Proteínas/química , Proteínas/metabolismo , Biologia Computacional , Simulação por Computador , Bases de Dados de Proteínas , Ligação de Hidrogênio , Modelos Químicos , Modelos Moleculares , Ligação Proteica , Reprodutibilidade dos Testes
16.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1653-1669, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-32749963

RESUMO

Unsupervised domain adaptation is effective in leveraging rich information from a labeled source domain to an unlabeled target domain. Though deep learning and adversarial strategy made a significant breakthrough in the adaptability of features, there are two issues to be further studied. First, hard-assigned pseudo labels on the target domain are arbitrary and error-prone, and direct application of them may destroy the intrinsic data structure. Second, batch-wise training of deep learning limits the characterization of the global structure. In this paper, a Riemannian manifold learning framework is proposed to achieve transferability and discriminability simultaneously. For the first issue, this framework establishes a probabilistic discriminant criterion on the target domain via soft labels. Based on pre-built prototypes, this criterion is extended to a global approximation scheme for the second issue. Manifold metric alignment is adopted to be compatible with the embedding space. The theoretical error bounds of different alignment metrics are derived for constructive guidance. The proposed method can be used to tackle a series of variants of domain adaptation problems, including both vanilla and partial settings. Extensive experiments have been conducted to investigate the method and a comparative study shows the superiority of the discriminative manifold learning framework.


Assuntos
Algoritmos
17.
IEEE Trans Cybern ; 52(8): 8352-8365, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33544687

RESUMO

For a broad range of applications, hyperspectral image (HSI) classification is a hot topic in remote sensing, and convolutional neural network (CNN)-based methods are drawing increasing attention. However, to train millions of parameters in CNN requires a large number of labeled training samples, which are difficult to collect. A conventional Gabor filter can effectively extract spatial information with different scales and orientations without training, but it may be missing some important discriminative information. In this article, we propose the Gabor ensemble filter (GEF), a new convolutional filter to extract deep features for HSI with fewer trainable parameters. GEF filters each input channel by some fixed Gabor filters and learnable filters simultaneously, then reduces the dimensions by some learnable 1×1 filters to generate the output channels. The fixed Gabor filters can extract common features with different scales and orientations, while the learnable filters can learn some complementary features that Gabor filters cannot extract. Based on GEF, we design a network architecture for HSI classification, which extracts deep features and can learn from limited training samples. In order to simultaneously learn more discriminative features and an end-to-end system, we propose to introduce the local discriminant structure for cross-entropy loss by combining the triplet hard loss. Results of experiments on three HSI datasets show that the proposed method has significantly higher classification accuracy than other state-of-the-art methods. Moreover, the proposed method is speedy for both training and testing.


Assuntos
Algoritmos , Redes Neurais de Computação
18.
Opt Lett ; 36(19): 3933-5, 2011 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-21964146

RESUMO

Because of the limited approximation capability of using fixed basis functions, the performance of reflectance estimation obtained by traditional linear models will not be optimal. We propose an approach based on the regularized local linear model. Our approach performs efficiently and knowledge of the spectral power distribution of the illuminant and the spectral sensitivities of the camera is not needed. Experimental results show that the proposed method performs better than some well-known methods in terms of both reflectance error and colorimetric error.

19.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2891-2897, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33656995

RESUMO

The identification of cancer subtypes is of great importance for understanding the heterogeneity of tumors and providing patients with more accurate diagnoses and treatments. However, it is still a challenge to effectively integrate multiple omics data to establish cancer subtypes. In this paper, we propose an unsupervised integration method, named weighted multi-view low rank representation (WMLRR), to identify cancer subtypes from multiple types of omics data. Given a group of patients described by multiple omics data matrices, we first learn a unified affinity matrix which encodes the similarities among patients by exploring the sparsity-consistent low-rank representations from the joint decompositions of multiple omics data matrices. Unlike existing subtype identification methods that treat each omics data matrix equally, we assign a weight to each omics data matrix and learn these weights automatically through the optimization process. Finally, we apply spectral clustering on the learned affinity matrix to identify cancer subtypes. Experiment results show that the survival times between our identified cancer subtypes are significantly different, and our predicted survivals are more accurate than other state-of-the-art methods. In addition, some clinical analyses of the diseases also demonstrate the effectiveness of our method in identifying molecular subtypes with biological significance and clinical relevance.


Assuntos
Biologia Computacional/métodos , Neoplasias , Aprendizado de Máquina não Supervisionado , Algoritmos , Análise por Conglomerados , Metilação de DNA/genética , Humanos , Neoplasias/classificação , Neoplasias/genética , Neoplasias/mortalidade , Transcriptoma/genética
20.
IEEE Trans Cybern ; 51(4): 2166-2177, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-31880576

RESUMO

Domain adaptation (DA) and transfer learning with statistical property description is very important in image analysis and data classification. This article studies the domain adaptive feature representation problem for the heterogeneous data, of which both the feature dimensions and the sample distributions across domains are so different that their features cannot be matched directly. To transfer the discriminant information efficiently from the source domain to the target domain, and then enhance the classification performance for the target data, we first introduce two projection matrices specified for different domains to transform the heterogeneous features into a shared space. We then propose a joint kernel regression model to learn the regression variable, which is called feature translator in this article. The novelty focuses on the exploration of optimal experimental design (OED) to deal with the heterogeneous and nonlinear DA by seeking the covariance structured feature translators (CSFTs). An approximate and efficient method is proposed to compute the optimal data projections. Comprehensive experiments are conducted to validate the effectiveness and efficacy of the proposed model. The results show the state-of-the-art performance of our method in heterogeneous DA.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA