Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 9 de 9
Filtrer
Plus de filtres










Base de données
Gamme d'année
1.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1154-1164, 2022.
Article de Anglais | MEDLINE | ID: mdl-33026977

RÉSUMÉ

The rapid development of single-cell RNA sequencing (scRNA-seq)technology reveals the gene expression status and gene structure of individual cells, reflecting the heterogeneity and diversity of cells. The traditional methods of scRNA-seq data analysis treat data as the same subspace, and hide structural information in other subspaces. In this paper, we propose a low-rank subspace ensemble clustering framework (LRSEC)to analyze scRNA-seq data. Assuming that the scRNA-seq data exist in multiple subspaces, the low-rank model is used to find the lowest rank representation of the data in the subspace. It is worth noting that the penalty factor of the low-rank kernel function is uncertain, and different penalty factors correspond to different low-rank structures. Moreover, the single cluster model is difficult to find the cellular structure of all datasets. To strengthen the correlation between model solutions, we construct a new ensemble clustering framework LRSEC by using the low-rank model as the basic learner. The LRSEC framework captures the global structure of data through low-rank subspaces, which has better clustering performance than a single clustering model. We validate the performance of the LRSEC framework on seven small datasets and one large dataset and obtain satisfactory results.


Sujet(s)
Algorithmes , Analyse sur cellule unique , Analyse de regroupements , Analyse de séquence d'ARN , Analyse sur cellule unique/méthodes ,
2.
IEEE J Biomed Health Inform ; 26(1): 458-467, 2022 01.
Article de Anglais | MEDLINE | ID: mdl-34156956

RÉSUMÉ

The development of single-cell RNA sequencing (scRNA-seq) technology has made it possible to measure gene expression levels at the resolution of a single cell, which further reveals the complex growth processes of cells such as mutation and differentiation. Recognizing cell heterogeneity is one of the most critical tasks in scRNA-seq research. To solve it, we propose a non-negative matrix factorization framework based on multi-subspace cell similarity learning for unsupervised scRNA-seq data analysis (MscNMF). MscNMF includes three parts: data decomposition, similarity learning, and similarity fusion. The three work together to complete the data similarity learning task. MscNMF can learn the gene features and cell features of different subspaces, and the correlation and heterogeneity between cells will be more prominent in multi-subspaces. The redundant information and noise in each low-dimensional feature space are eliminated, and its gene weight information can be further analyzed to calculate the optimal number of subpopulations. The final cell similarity learning will be more satisfactory due to the fusion of cell similarity information in different subspaces. The advantage of MscNMF is that it can calculate the number of cell types and the rank of Non-negative matrix factorization (NMF) reasonably. Experiments on eight real scRNA-seq datasets show that MscNMF can effectively perform clustering tasks and extract useful genetic markers. To verify its clustering performance, the framework is compared with other latest clustering algorithms and satisfactory results are obtained. The code of MscNMF is free available for academic (https://github.com/wangchuanyuan1/project-MscNMF).


Sujet(s)
Algorithmes , Analyse sur cellule unique , Analyse de regroupements , Analyse de profil d'expression de gènes , Marqueurs génétiques , Humains , Analyse de séquence d'ARN/méthodes , Analyse sur cellule unique/méthodes
3.
Interdiscip Sci ; 13(3): 476-489, 2021 Sep.
Article de Anglais | MEDLINE | ID: mdl-34076860

RÉSUMÉ

High-throughput sequencing of single-cell gene expression reveals a complex mechanism of individual cell's heterogeneity in a population. An important purpose for analyzing single-cell RNA sequencing (scRNA-seq) data is to identify cell subtypes and functions by cell clustering. To deal with high levels of noise and cellular heterogeneity, we introduced a new single cell data analysis model called Adaptive Total-Variation Regularized Low-Rank Representation (ATV-LRR). In scRNA-seq data, ATV-LRR can reconstruct the low-rank subspace structure to learn the similarity of cells. The low-rank representation can not only segment multiple linear subspaces, but also extract important information. Moreover, adaptive total variation also can remove cell noise and preserve cell feature details by learning the gradient information of the data. At the same time, to analyze scRNA-seq data with unknown prior information, we introduced the maximum eigenvalue method into the ATV-LRR model to automatically identify cell populations. The final clustering results show that the ATV-LRR model can detect cell types more effectively and stably.


Sujet(s)
RNA-Seq , Algorithmes , Analyse de regroupements , Analyse de profil d'expression de gènes , Analyse sur cellule unique
4.
J Bioinform Comput Biol ; 19(1): 2050047, 2021 02.
Article de Anglais | MEDLINE | ID: mdl-33410727

RÉSUMÉ

Non-negative Matrix Factorization (NMF) is a popular data dimension reduction method in recent years. The traditional NMF method has high sensitivity to data noise. In the paper, we propose a model called Sparse Robust Graph-regularized Non-negative Matrix Factorization based on Correntropy (SGNMFC). The maximized correntropy replaces the traditional minimized Euclidean distance to improve the robustness of the algorithm. Through the kernel function, correntropy can give less weight to outliers and noise in data but give greater weight to meaningful data. Meanwhile, the geometry structure of the high-dimensional data is completely preserved in the low-dimensional manifold through the graph regularization. Feature selection and sample clustering are commonly used methods for analyzing genes. Sparse constraints are applied to the loss function to reduce matrix complexity and analysis difficulty. Comparing the other five similar methods, the effectiveness of the SGNMFC model is proved by selection of differentially expressed genes and sample clustering experiments in three The Cancer Genome Atlas (TCGA) datasets.


Sujet(s)
Algorithmes , Biologie informatique/méthodes , Expression des gènes , Tumeurs/génétique , Analyse de regroupements , Infographie , Interprétation statistique de données , Bases de données génétiques , Régulation de l'expression des gènes tumoraux , Humains
5.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2375-2383, 2021.
Article de Anglais | MEDLINE | ID: mdl-32086220

RÉSUMÉ

Non-negative matrix factorization (NMF) is a dimensionality reduction technique based on high-dimensional mapping. It can learn part-based representations effectively. In this paper, we propose a method called Dual Hyper-graph Regularized Supervised Non-negative Matrix Factorization (HSNMF). To encode the geometric information of the data, the hyper-graph is introduced into the model as a regularization term. The advantage of hyper-graph learning is to find higher order data relationship to enhance data relevance. This method constructs the data hyper-graph and the feature hyper-graph to find the data manifold and the feature manifold simultaneously. The application of hyper-graph theory in cancer datasets can effectively find pathogenic genes. The discrimination information is further introduced into the objective function to obtain more information about the data. Supervised learning with label information greatly improves the classification effect. Furthermore, the real datasets of cancer usually contain sparse noise, so the L2,1-norm is applied to enhance the robustness of HSNMF algorithm. Experiments under The Cancer Genome Atlas (TCGA) datasets verify the feasibility of the HSNMF method.


Sujet(s)
Algorithmes , Biologie informatique/méthodes , Tumeurs , Bases de données génétiques , Humains , Tumeurs/classification , Tumeurs/génétique
6.
Front Genet ; 10: 1054, 2019.
Article de Anglais | MEDLINE | ID: mdl-31824556

RÉSUMÉ

Non-negative matrix factorization (NMF) is a matrix decomposition method based on the square loss function. To exploit cancer information, cancer gene expression data often uses the NMF method to reduce dimensionality. Gene expression data usually have some noise and outliers, while the original NMF loss function is very sensitive to non-Gaussian noise. To improve the robustness and clustering performance of the algorithm, we propose a sparse graph regularization NMF based on Huber loss model for cancer data analysis (Huber-SGNMF). Huber loss is a function between L 1-norm and L 2-norm that can effectively handle non-Gaussian noise and outliers. Taking into account the sparsity matrix and data geometry information, sparse penalty and graph regularization terms are introduced into the model to enhance matrix sparsity and capture data manifold structure. Before the experiment, we first analyzed the robustness of Huber-SGNMF and other models. Experiments on The Cancer Genome Atlas (TCGA) data have shown that Huber-SGNMF performs better than other most advanced methods in sample clustering and differentially expressed gene selection.

7.
Huan Jing Ke Xue ; 36(4): 1256-62, 2015 Apr.
Article de Chinois | MEDLINE | ID: mdl-26164898

RÉSUMÉ

Based on the laser particle size and X-ray diffraction (XRD) analysis, 28 sediment samples collected from the inshore region of the Yellow River estuary in October 2013 were determined to discuss the influence of long-term implementation of the flow-sediment regulation scheme (FSRS, initiated in 2002) on the distributions of grain size and clay components (smectite, illite, kaolinite and chlorite) in sediments. Results showed that, after the FSRS was implemented for more than 10 years, although the proportion of sand in inshore sediments of the Yellow River estuary was higher (average value, 23.5%) than those in sediments of the Bohai Sea and the Yellow River, silt was predominated (average value, 59.1%) and clay components were relatively low (average value, 17.4%). The clay components in sediments of the inshore region in the Yellow River estuary were close with those in the Yellow River. The situation was greatly changed due to the implementation of FSRS since 2002, and the clay components were in the order of illite > smectite > chlorite > kaolinite. This study also indicated that, compared to large-scale investigation in Bohai Sea, the local study on the inshore region of the Yellow River estuary was more favorable for revealing the effects of long-term implementation of the FSRS on sedimentation environment of the Yellow River estuary.


Sujet(s)
Silicates d'aluminium , Estuaires , Sédiments géologiques , Chine , Argile , Kaolin , Minéraux , Rivières , Diffraction des rayons X
8.
Huan Jing Ke Xue ; 36(2): 457-63, 2015 Feb.
Article de Chinois | MEDLINE | ID: mdl-26031070

RÉSUMÉ

Estuary is an important area contributing to the global carbon cycle. In order to analyze the spatial-temporal distribution characteristics of the dissolved inorganic carbon (DIC) in the surface water of Yellow River estuary. Samples were collected in spring, summer, fall, winter of 2013, and discussed the correlation between the content of DIC and environmental factors. The results show that, the DIC concentration of the surface water in Yellow River estuary is in a range of 26.34-39.43 mg x L(-1), and the DIC concentration in freshwater side is higher than that in the sea side. In some areas where the salinity is less than 15 per thousand, the DIC concentration appears significant losses-the maximum loss is 20.46%. Seasonal distribution of performance in descending order is spring, fall, winter, summer. Through principal component analysis, it shows that water temperature, suspended solids, salinity and chlorophyll a are the main factors affecting the variation of the DIC concentration in surface water, their contribution rate is as high as 83% , and alkalinity, pH, dissolved organic carbon, dissolved oxygen and other factors can not be ignored. The loss of DIC in the low area is due to the calcium carbonate sedimentation. DIC presents a gradually increasing trend, which is mainly due to the effects of water retention time, temperature, outside input and environmental conditions.


Sujet(s)
Carbone/analyse , Surveillance de l'environnement , Estuaires , Saisons , Carbonate de calcium , Isotopes du carbone , Chine , Chlorophylle , Chlorophylle A , Rivières , Analyse spatio-temporelle , Température
9.
Huan Jing Ke Xue ; 28(3): 659-63, 2007 Mar.
Article de Chinois | MEDLINE | ID: mdl-17633651

RÉSUMÉ

The geochemical characteristics of radon and mercury in soil gas in Lhasa and vicinity are investigated based on the measurements of Rn and Hg concentrations, and environmental quality for Rn and Hg in soil gas was evaluated by means of the index of geoaccumulation. The data of Rn and Hg of 1 579 sampling site indicate that the values of environmental-geochemical background of Rn and Hg are 7 634.9 Bq/m3, 41.5 ng/m3 with standard deviations of 2.7 Bq/m3, 2.2 ng/m3, respectively. The environmental quality for Rn in soil gas is better in the west and east parts of studied area, but becomes moderate pollution (level III) in the north part of the central area. Rn is derived from radioactive elements in granitic sediments in the intermountain basin and granite base, which are the major sources of pollution. The environmental quality for Hg in soil gas becomes gradually polluted from the suburban to the center of urban, and the highest pollution reaches level IV. The background of Hg in soil gas is mainly controlled by compositions of sediments, but the Hg pollution caused by human waste and religionary use of mercury.


Sujet(s)
Mercure/analyse , Radon/analyse , Polluants radioactifs du sol/analyse , Polluants du sol/analyse , Surveillance de l'environnement , Tibet
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...