Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Entropy (Basel) ; 26(1)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38275500

RESUMEN

Large-scale and high-dimensional time series data are widely generated in modern applications such as intelligent transportation and environmental monitoring. However, such data contains much noise, outliers, and missing values due to interference during measurement or transmission. Directly forecasting such types of data (i.e., anomalous data) can be extremely challenging. The traditional method to deal with anomalies is to cut out the time series with anomalous value entries or replace the data. Both methods may lose important knowledge from the original data. In this paper, we propose a multidimensional time series forecasting framework that can better handle anomalous values: the robust temporal nonnegative matrix factorization forecasting model (RTNMFFM) for multi-dimensional time series. RTNMFFM integrates the autoregressive regularizer into nonnegative matrix factorization (NMF) with the application of the L2,1 norm in NMF. This approach improves robustness and alleviates overfitting compared to standard methods. In addition, to improve the accuracy of model forecasts on severely missing data, we propose a periodic smoothing penalty that keeps the sparse time slices as close as possible to the time slice with high confidence. Finally, we train the model using the alternating gradient descent algorithm. Numerous experiments demonstrate that RTNMFFM provides better robustness and better prediction accuracy.

2.
Entropy (Basel) ; 24(10)2022 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-37420344

RESUMEN

Accurate clustering is a challenging task with unlabeled data. Ensemble clustering aims to combine sets of base clusterings to obtain a better and more stable clustering and has shown its ability to improve clustering accuracy. Dense representation ensemble clustering (DREC) and entropy-based locally weighted ensemble clustering (ELWEC) are two typical methods for ensemble clustering. However, DREC treats each microcluster equally and hence, ignores the differences between each microcluster, while ELWEC conducts clustering on clusters rather than microclusters and ignores the sample-cluster relationship. To address these issues, a divergence-based locally weighted ensemble clustering with dictionary learning (DLWECDL) is proposed in this paper. Specifically, the DLWECDL consists of four phases. First, the clusters from the base clustering are used to generate microclusters. Second, a Kullback-Leibler divergence-based ensemble-driven cluster index is used to measure the weight of each microcluster. With these weights, an ensemble clustering algorithm with dictionary learning and the L2,1-norm is employed in the third phase. Meanwhile, the objective function is resolved by optimizing four subproblems and a similarity matrix is learned. Finally, a normalized cut (Ncut) is used to partition the similarity matrix and the ensemble clustering results are obtained. In this study, the proposed DLWECDL was validated on 20 widely used datasets and compared to some other state-of-the-art ensemble clustering methods. The experimental results demonstrated that the proposed DLWECDL is a very promising method for ensemble clustering.

3.
BMC Bioinformatics ; 21(1): 61, 2020 Feb 18.
Artículo en Inglés | MEDLINE | ID: mdl-32070280

RESUMEN

BACKGROUND: The aberrant expression of microRNAs is closely connected to the occurrence and development of a great deal of human diseases. To study human diseases, numerous effective computational models that are valuable and meaningful have been presented by researchers. RESULTS: Here, we present a computational framework based on graph Laplacian regularized L2, 1-nonnegative matrix factorization (GRL2, 1-NMF) for inferring possible human disease-connected miRNAs. First, manually validated disease-connected microRNAs were integrated, and microRNA functional similarity information along with two kinds of disease semantic similarities were calculated. Next, we measured Gaussian interaction profile (GIP) kernel similarities for both diseases and microRNAs. Then, we adopted a preprocessing step, namely, weighted K nearest known neighbours (WKNKN), to decrease the sparsity of the miRNA-disease association matrix network. Finally, the GRL2,1-NMF framework was used to predict links between microRNAs and diseases. CONCLUSIONS: The new method (GRL2, 1-NMF) achieved AUC values of 0.9280 and 0.9276 in global leave-one-out cross validation (global LOOCV) and five-fold cross validation (5-CV), respectively, showing that GRL2, 1-NMF can powerfully discover potential disease-related miRNAs, even if there is no known associated disease.


Asunto(s)
Algoritmos , Enfermedad/genética , MicroARNs , Biología Computacional/métodos , Humanos
4.
Hum Genomics ; 13(Suppl 1): 46, 2019 10 22.
Artículo en Inglés | MEDLINE | ID: mdl-31639067

RESUMEN

BACKGROUND: As one of the most popular data representation methods, non-negative matrix decomposition (NMF) has been widely concerned in the tasks of clustering and feature selection. However, most of the previously proposed NMF-based methods do not adequately explore the hidden geometrical structure in the data. At the same time, noise and outliers are inevitably present in the data. RESULTS: To alleviate these problems, we present a novel NMF framework named robust hypergraph regularized non-negative matrix factorization (RHNMF). In particular, the hypergraph Laplacian regularization is imposed to capture the geometric information of original data. Unlike graph Laplacian regularization which captures the relationship between pairwise sample points, it captures the high-order relationship among more sample points. Moreover, the robustness of the RHNMF is enhanced by using the L2,1-norm constraint when estimating the residual. This is because the L2,1-norm is insensitive to noise and outliers. CONCLUSIONS: Clustering and common abnormal expression gene (com-abnormal expression gene) selection are conducted to test the validity of the RHNMF model. Extensive experimental results on multi-view datasets reveal that our proposed model outperforms other state-of-the-art methods.


Asunto(s)
Algoritmos , Bases de Datos Genéticas , Regulación Neoplásica de la Expresión Génica , Análisis por Conglomerados , Humanos , Neoplasias/genética
5.
BMC Bioinformatics ; 20(Suppl 25): 686, 2019 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-31874608

RESUMEN

BACKGROUND: Predicting miRNA-disease associations (MDAs) is time-consuming and expensive. It is imminent to improve the accuracy of prediction results. So it is crucial to develop a novel computing technology to predict new MDAs. Although some existing methods can effectively predict novel MDAs, there are still some shortcomings. Especially when the disease matrix is processed, its sparsity is an important factor affecting the final results. RESULTS: A robust collaborative matrix factorization (RCMF) is proposed to predict novel MDAs. The L2,1-norm are introduced to our method to achieve the highest AUC value than other advanced methods. CONCLUSIONS: 5-fold cross validation is used to evaluate our method, and simulation experiments are used to predict novel associations on Gold Standard Dataset. Finally, our prediction accuracy is better than other existing advanced methods. Therefore, our approach is effective and feasible in predicting novel MDAs.


Asunto(s)
Algoritmos , Neoplasias Hepáticas/genética , MicroARNs/metabolismo , Área Bajo la Curva , Humanos , Neoplasias Hepáticas/patología , Curva ROC
6.
BMC Bioinformatics ; 20(1): 5, 2019 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-30611214

RESUMEN

BACKGROUND: Predicting drug-disease interactions (DDIs) is time-consuming and expensive. Improving the accuracy of prediction results is necessary, and it is crucial to develop a novel computing technology to predict new DDIs. The existing methods mostly use the construction of heterogeneous networks to predict new DDIs. However, the number of known interacting drug-disease pairs is small, so there will be many errors in this heterogeneous network that will interfere with the final results. RESULTS: A novel method, known as the dual-network L2,1-collaborative matrix factorization, is proposed to predict novel DDIs. The Gaussian interaction profile kernels and L2,1-norm are introduced in our method to achieve better results than other advanced methods. The network similarities of drugs and diseases with their chemical and semantic similarities are combined in this method. CONCLUSIONS: Cross validation is used to evaluate our method, and simulation experiments are used to predict new interactions using two different datasets. Finally, our prediction accuracy is better than other existing methods. This proves that our method is feasible and effective.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Enfermedad , Interacciones Farmacológicas , Área Bajo la Curva , Bases de Datos como Asunto , Humanos , Reproducibilidad de los Resultados , Semántica
7.
BMC Bioinformatics ; 20(Suppl 8): 287, 2019 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-31182006

RESUMEN

BACKGROUND: Predicting drug-target interactions is time-consuming and expensive. It is important to present the accuracy of the calculation method. There are many algorithms to predict global interactions, some of which use drug-target networks for prediction (ie, a bipartite graph of bound drug pairs and targets known to interact). Although these algorithms can predict some drug-target interactions to some extent, there is little effect for some new drugs or targets that have no known interaction. RESULTS: Since the datasets are usually located at or near low-dimensional nonlinear manifolds, we propose an improved GRMF (graph regularized matrix factorization) method to learn these flow patterns in combination with the previous matrix-decomposition method. In addition, we use one of the pre-processing steps previously proposed to improve the accuracy of the prediction. CONCLUSIONS: Cross-validation is used to evaluate our method, and simulation experiments are used to predict new interactions. In most cases, our method is superior to other methods. Finally, some examples of new drugs and new targets are predicted by performing simulation experiments. And the improved GRMF method can better predict the remaining drug-target interactions.


Asunto(s)
Algoritmos , Interacciones Farmacológicas , Bases de Datos como Asunto , Humanos , Reproducibilidad de los Resultados
8.
J Imaging ; 10(7)2024 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-39057722

RESUMEN

Nonmydriatic retinal fundus images often suffer from quality issues and artifacts due to ocular or systemic comorbidities, leading to potential inaccuracies in clinical diagnoses. In recent times, deep learning methods have been widely employed to improve retinal image quality. However, these methods often require large datasets and lack robustness in clinical settings. Conversely, the inherent stability and adaptability of traditional unsupervised learning methods, coupled with their reduced reliance on extensive data, render them more suitable for real-world clinical applications, particularly in the limited data context of high noise levels or a significant presence of artifacts. However, existing unsupervised learning methods encounter challenges such as sensitivity to noise and outliers, reliance on assumptions like cluster shapes, and difficulties with scalability and interpretability, particularly when utilized for retinal image enhancement. To tackle these challenges, we propose a novel robust PCA (RPCA) method with low-rank sparse decomposition that also integrates affine transformations τi, weighted nuclear norm, and the L2,1 norms, aiming to overcome existing method limitations and to achieve image quality improvement unseen by these methods. We employ the weighted nuclear norm (Lw,∗) to assign weights to singular values to each retinal images and utilize the L2,1 norm to eliminate correlated samples and outliers in the retinal images. Moreover, τi is employed to enhance retinal image alignment, making the new method more robust to variations, outliers, noise, and image blurring. The Alternating Direction Method of Multipliers (ADMM) method is used to optimally determine parameters, including τi, by solving an optimization problem. Each parameter is addressed separately, harnessing the benefits of ADMM. Our method introduces a novel parameter update approach and significantly improves retinal image quality, detecting cataracts, and diabetic retinopathy. Simulation results confirm our method's superiority over existing state-of-the-art methods across various datasets.

9.
Comput Biol Chem ; 110: 108078, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38677013

RESUMEN

MicroRNAs (miRNAs) play a vital role in regulating gene expression and various biological processes. As a result, they have been identified as effective targets for small molecule (SM) drugs in disease treatment. Heterogeneous graph inference stands as a classical approach for predicting SM-miRNA associations, showcasing commendable convergence accuracy and speed. However, most existing methods do not adequately address the inherent sparsity in SM-miRNA association networks, and imprecise SM/miRNA similarity metrics reduce the accuracy of predicting SM-miRNA associations. In this research, we proposed a heterogeneous graph inference with range constrained L2,1-collaborative matrix factorization (HGIRCLMF) method to predict potential SM-miRNA associations. First, we computed the multi-source similarities of SM/miRNA and integrated these similarity information into a comprehensive SM/miRNA similarity. This step improved the accuracy of SM and miRNA similarity, ensuring reliability for the subsequent inference of the heterogeneity map. Second, we used a range constrained L2,1-collaborative matrix factorization (RCLMF) model to pre-populate the SM-miRNA association matrix with missing values. In this step, we developed a novel matrix decomposition method that enhances the robustness and formative nature of SM-miRNA edges between SM networks and miRNA networks. Next, we built a well-established SM-miRNA heterogeneous network utilizing the processed biological information. Finally, HGIRCLMF used this network data to infer unknown association pair scores. We implemented four cross-validation experiments on two distinct datasets, and HGIRCLMF acquired the highest areas under the curve, surpassing six state-of-the-art computational approaches. Furthermore, we performed three case studies to validate the predictive power of our method in practical application.


Asunto(s)
MicroARNs , MicroARNs/genética , Bibliotecas de Moléculas Pequeñas/química , Biología Computacional/métodos , Algoritmos , Humanos
10.
Front Genet ; 12: 621317, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33708239

RESUMEN

The dimensionality reduction method accompanied by different norm constraints plays an important role in mining useful information from large-scale gene expression data. In this article, a novel method named Lp-norm and L2,1-norm constrained graph Laplacian principal component analysis (PL21GPCA) based on traditional principal component analysis (PCA) is proposed for robust tumor sample clustering and gene network module discovery. Three aspects are highlighted in the PL21GPCA method. First, to degrade the high sensitivity to outliers and noise, the non-convex proximal Lp-norm (0 < p < 1)constraint is applied on the loss function. Second, to enhance the sparsity of gene expression in cancer samples, the L2,1-norm constraint is used on one of the regularization terms. Third, to retain the geometric structure of the data, we introduce the graph Laplacian regularization item to the PL21GPCA optimization model. Extensive experiments on five gene expression datasets, including one benchmark dataset, two single-cancer datasets from The Cancer Genome Atlas (TCGA), and two integrated datasets of multiple cancers from TCGA, are performed to validate the effectiveness of our method. The experimental results demonstrate that the PL21GPCA method performs better than many other methods in terms of tumor sample clustering. Additionally, this method is used to discover the gene network modules for the purpose of finding key genes that may be associated with some cancers.

11.
Neural Netw ; 128: 126-141, 2020 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-32446190

RESUMEN

Multi-view feature extraction methods mainly focus on exploiting the consistency and complementary information between multi-view samples, and most of the current methods apply the F-norm or L2-norm as the metric, which are sensitive to the outliers or noises. In this paper, based on L2,1-norm, we propose a unified robust feature extraction framework, which includes four special multi-view feature extraction methods, and extends the state-of-art methods to a more generalized form. The proposed methods are less sensitive to outliers or noises. An efficient iterative algorithm is designed to solve L2,1-norm based methods. Comprehensive analyses, such as convergence analysis, rotational invariance analysis and relationship between our methods and previous F-norm based methods illustrate the effectiveness of our proposed methods. Experiments on two artificial datasets and six real datasets demonstrate that the proposed L2,1-norm based methods have better performance than the related methods.


Asunto(s)
Algoritmos , Bases de Datos Factuales , Reconocimiento de Normas Patrones Automatizadas/métodos , Humanos
12.
Comput Biol Chem ; 89: 107368, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-32919230

RESUMEN

With the development of cancer research, various gene expression datasets containing cancer information show an explosive growth trend. In addition, due to the continuous maturity of single-cell RNA sequencing (scRNA-seq) technology, the protein information and pedigree information of a single cell are also continuously mined. It is a technical problem of how to classify these high-dimensional data correctly. In recent years, Extreme Learning Machine (ELM) has been widely used in the field of supervised learning and unsupervised learning. However, the traditional ELM does not consider the robustness of the method. To improve the robustness of ELM, in this paper, a novel ELM method based on L2,1-norm named L2,1-Extreme Learning Machine (L2,1 -ELM) has been proposed. The method introduces L2,1-norm on loss function to improve the robustness, and minimizes the influence of noise and outliers. Firstly, we evaluate the new method on five UCI datasets. The experiment results prove that our method can achieve competitive results. Next, the novel method is applied to the problem of classification of cancer samples and single-cell RNA sequencing datasets. The experimental results on The Cancer Genome Atlas (TCGA) datasets and scRNA-seq datasets prove that ELM and its variants has great potential in the classification of cancer samples.


Asunto(s)
Aprendizaje Automático , Neoplasias/clasificación , Algoritmos , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Humanos
13.
Oncotarget ; 8(29): 48075-48085, 2017 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-28624800

RESUMEN

The traditional methods of drug discovery follow the "one drug-one target" approach, which ignores the cellular and physiological environment of the action mechanism of drugs. However, pathway-based drug discovery methods can overcome this limitation. This kind of method, such as the Integrative Penalized Matrix Decomposition (iPaD) method, identifies the drug-pathway associations by taking the lasso-type penalty on the regularization term. Moreover, instead of imposing the L1-norm regularization, the L2,1-Integrative Penalized Matrix Decomposition (L2,1-iPaD) method imposes the L2,1-norm penalty on the regularization term. In this paper, based on the iPaD and L2,1-iPaD methods, we propose a novel method named L1L2,1-iPaD (L1L2,1-Integrative Penalized Matrix Decomposition), which takes the sum of the L1-norm and L2,1-norm penalties on the regularization term. Besides, we perform permutation test to assess the significance of the identified drug-pathway association pairs and compute the P-values. Compared with the existing methods, our method can identify more drug-pathway association pairs which have been validated in the CancerResource database. In order to identify drug-pathway associations which are not validated in the CancerResource database, we retrieve published papers to prove these associations. The results on two real datasets prove that our method can achieve better enrichment for identified association pairs than the iPaD and L2,1-iPaD methods.


Asunto(s)
Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Biología de Sistemas/métodos , Algoritmos , Línea Celular Tumoral , Bases de Datos Factuales , Humanos
14.
BMC Syst Biol ; 11(Suppl 6): 119, 2017 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-29297378

RESUMEN

BACKGROUND: Traditional drug identification methods follow the "one drug-one target" thought. But those methods ignore the natural characters of human diseases. To overcome this limitation, many identification methods of drug-pathway association pairs have been developed, such as the integrative penalized matrix decomposition (iPaD) method. The iPaD method imposes the L1-norm penalty on the regularization term. However, lasso-type penalties have an obvious disadvantage, that is, the sparsity produced by them is too dispersive. RESULTS: Therefore, to improve the performance of the iPaD method, we propose a novel method named L2,1-iPaD to identify paired drug-pathway associations. In the L2,1-iPaD model, we use the L2,1-norm penalty to replace the L1-norm penalty since the L2,1-norm penalty can produce row sparsity. CONCLUSIONS: By applying the L2,1-iPaD method to the CCLE and NCI-60 datasets, we demonstrate that the performance of L2,1-iPaD method is superior to existing methods. And the proposed method can achieve better enrichment in terms of discovering validated drug-pathway association pairs than the iPaD method by performing permutation test. The results on the two real datasets prove that our method is effective.


Asunto(s)
Descubrimiento de Drogas/métodos , Algoritmos , Biología Computacional , Conjuntos de Datos como Asunto , Humanos , Modelos Teóricos
15.
Mol Inform ; 36(4)2017 04.
Artículo en Inglés | MEDLINE | ID: mdl-27863104

RESUMEN

Feature selection has been regarded as an effective tool to help researchers understand the generating process of data. For mining the synthesis mechanism of microporous AlPOs, this paper proposes a novel feature selection method by joint l2,1 norm and Fisher discrimination constraints (JNFDC). In order to obtain more effective feature subset, the proposed method can be achieved in two steps. The first step is to rank the features according to sparse and discriminative constraints. The second step is to establish predictive model with the ranked features, and select the most significant features in the light of the contribution of improving the predictive accuracy. To the best of our knowledge, JNFDC is the first work which employs the sparse representation theory to explore the synthesis mechanism of six kinds of pore rings. Numerical simulations demonstrate that our proposed method can select significant features affecting the specified structural property and improve the predictive accuracy. Moreover, comparison results show that JNFDC can obtain better predictive performances than some other state-of-the-art feature selection methods.


Asunto(s)
Compuestos de Aluminio/química , Fosfatos/química , Algoritmos , Compuestos de Aluminio/síntesis química , Cristalización , Geles/química , Fosfatos/síntesis química , Porosidad , Solventes/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA