Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 11 de 11
Filtrer
Plus de filtres











Base de données
Gamme d'année
1.
BMC Bioinformatics ; 25(1): 169, 2024 Apr 29.
Article de Anglais | MEDLINE | ID: mdl-38684942

RÉSUMÉ

Many important biological facts have been found as single-cell RNA sequencing (scRNA-seq) technology has advanced. With the use of this technology, it is now possible to investigate the connections among individual cells, genes, and illnesses. For the analysis of single-cell data, clustering is frequently used. Nevertheless, biological data usually contain a large amount of noise data, and traditional clustering methods are sensitive to noise. However, acquiring higher-order spatial information from the data alone is insufficient. As a result, getting trustworthy clustering findings is challenging. We propose the Cauchy hyper-graph Laplacian non-negative matrix factorization (CHLNMF) as a unique approach to address these issues. In CHLNMF, we replace the measurement based on Euclidean distance in the conventional non-negative matrix factorization (NMF), which can lessen the influence of noise, with the Cauchy loss function (CLF). The model also incorporates the hyper-graph constraint, which takes into account the high-order link among the samples. The CHLNMF model's best solution is then discovered using a half-quadratic optimization approach. Finally, using seven scRNA-seq datasets, we contrast the CHLNMF technique with the other nine top methods. The validity of our technique was established by analysis of the experimental outcomes.


Sujet(s)
Algorithmes , Analyse de séquence d'ARN , Analyse sur cellule unique , Analyse sur cellule unique/méthodes , Analyse de séquence d'ARN/méthodes , Humains , Analyse de regroupements , Biologie informatique/méthodes
2.
J Bioinform Comput Biol ; 20(2): 2250002, 2022 04.
Article de Anglais | MEDLINE | ID: mdl-35191362

RÉSUMÉ

Tensor Robust Principal Component Analysis (TRPCA) has achieved promising results in the analysis of genomics data. However, the TRPCA model under the existing tensor singular value decomposition ([Formula: see text]-SVD) framework insufficiently extracts the potential low-rank structure of the data, resulting in suboptimal restored components. Simultaneously, the tensor nuclear norm (TNN) defined based on [Formula: see text]-SVD uses the same standard to handle various singular values. TNN ignores the difference of singular values, leading to the failure of the main information that needs to be well preserved. To preserve the heterogeneous structure in the low-rank information, we propose a novel TNN and extend it to the TRPCA model. Potential low-rank space may contain important information. We learn the low-rank structural information from the core tensor. The singular value space contains the association information between genes and cancers. The [Formula: see text]-shrinkage generalized threshold function is utilized to preserve the low-rank properties of larger singular values. The optimization problem is solved by the alternating direction method of the multiplier (ADMM) algorithm. Clustering and feature selection experiments are performed on the TCGA data set. The experimental results show that the proposed model is more promising than other state-of-the-art tensor decomposition methods.


Sujet(s)
Algorithmes , Tumeurs , Analyse de regroupements , Génomique , Humains , Tumeurs/génétique , Analyse en composantes principales
3.
Elife ; 112022 02 15.
Article de Anglais | MEDLINE | ID: mdl-35166670

RÉSUMÉ

Large-scale multiparameter screening has become increasingly feasible and straightforward to perform thanks to developments in technologies such as high-content microscopy and high-throughput flow cytometry. The automated toolkits for analyzing similarities and differences between large numbers of tested conditions have not kept pace with these technological developments. Thus, effective analysis of multiparameter screening datasets becomes a bottleneck and a limiting factor in unbiased interpretation of results. Here we introduce compaRe, a toolkit for large-scale multiparameter data analysis, which integrates quality control, data bias correction, and data visualization methods with a mass-aware gridding algorithm-based similarity analysis providing a much faster and more robust analyses than existing methods. Using mass and flow cytometry data from acute myeloid leukemia and myelodysplastic syndrome patients, we show that compaRe can reveal interpatient heterogeneity and recognizable phenotypic profiles. By applying compaRe to high-throughput flow cytometry drug response data in AML models, we robustly identified multiple types of both deep and subtle phenotypic response patterns, highlighting how this analysis could be used for therapeutic discoveries. In conclusion, compaRe is a toolkit that uniquely allows for automated, rapid, and precise comparisons of large-scale multiparameter datasets, including high-throughput screens.


Biology has seen huge advances in technology in recent years. This has led to state-of-the-art techniques which can test hundreds of conditions simultaneously, such as how cancer cells respond to different drugs. In addition to this, each of the tens of thousands of cells studied can be screened for multiple variables, such as certain proteins or genes. This generates massive datasets with large numbers of parameters, which researchers can use to find similarities and differences between the tested conditions. Analyzing these 'high-throughput' experiments, however, is no easy task, as the data is often contaminated with meaningless information, or 'background noise', as well as sources of bias, such as non-biological variations between experiments. As a result, most analysis methods can only probe one parameter at a time, or are unautomated and require manual interpretation of the data. Here, Chalabi Hajkarim et al. have developed a new toolkit that can analyze multiparameter datasets faster and more robustly than current methods. The kit, which was named 'compaRe', combines a range of computational tools that automatically 'clean' the data of background noise or bias: the different conditions are then compared and any similarities are visually displayed using a graphical interface that is easy to explore. Chalabi Hajkarim et al. used their new method to study data from patients with acute myeloid leukemia (AML) and myelodysplastic syndrome, two forms of cancer that disrupt the production of functional immune cells. The toolkit was able to identify subtle differences between the patients and categorize them into groups based on the proteins present on immune cells. Chalabi Hajkarim et al. also applied compaRe to high-throughput data on cells from patients and mouse models with AML that had been treated with large numbers of specific drugs. This revealed that different cell types in the samples responded to the treatments in distinct ways. These findings suggest that the toolkit created by Chalabi Hajkarim et al. can automatically, rapidly and precisely compare large multiparameter datasets collected using high-throughput screens. In the future, compaRe could be used to identify drugs that illicit a specific response, or to predict how newly developed treatments impact different cell types in the body.


Sujet(s)
Leucémie aigüe myéloïde , Syndromes myélodysplasiques , Algorithmes , Cytométrie en flux/méthodes , Tests de criblage à haut débit , Humains , Leucémie aigüe myéloïde/traitement médicamenteux
4.
Methods Mol Biol ; 2416: 213-237, 2022.
Article de Anglais | MEDLINE | ID: mdl-34870839

RÉSUMÉ

Over the last decade, RNA-Sequencing (RNA-Seq) has revolutionized the field of transcriptomics due to its sheer advantage over previous technologies for studying gene expression. Even the domain of stem cell bioinformatics has benefited from these advancements. It has helped look deeper into how the process of pluripotency is maintained by stem cells and how it may be exploited for application in regenerative medicine. However, as it is still an evolving technology, there is no single accepted protocol for RNA-Seq data analysis. From a wide array of tools and/or algorithms available for the purpose, researchers tend to develop a pipeline that is best suited for their sample, experimental design, and computational power. In this tutorial, we describe a pipeline based on open-source tools to analyze RNA-Seq data from naïve and primed state human pluripotent stem cell samples. Precisely, we show how RNA-Seq data can be downloaded from databases, processed, and used to identify differentially expressed genes and construct a co-expression network. Further, we also show how the list of interesting genes obtained from differential expression testing or co-expression network be analyzed to gain biological insights.


Sujet(s)
Cellules souches pluripotentes , Transcriptome , Biologie informatique , Analyse de profil d'expression de gènes , Humains , Analyse de séquence d'ARN
5.
Interdiscip Sci ; 14(1): 22-33, 2022 Mar.
Article de Anglais | MEDLINE | ID: mdl-34115312

RÉSUMÉ

In recent years, clustering analysis of cancer genomics data has gained widespread attention. However, limited by the dimensions of the matrix, the traditional methods cannot fully mine the underlying geometric structure information in the data. Besides, noise and outliers inevitably exist in the data. To solve the above two problems, we come up with a new method which uses tensor to represent cancer omics data and applies hypergraph to save the geometric structure information in original data. This model is called hypergraph regularized tensor robust principal component analysis (HTRPCA). The data processed by HTRPCA becomes two parts, one of which is a low-rank component that contains pure underlying structure information between samples, and the other is some sparse interference points. So we can use the low-rank component for clustering. This model can retain complex geometric information between more sample points due to the addition of the hypergraph regularization. Through clustering, we can demonstrate the effectiveness of HTRPCA, and the experimental results on TCGA datasets demonstrate that HTRPCA precedes other advanced methods. This paper proposes a new method of using tensors to represent cancer omics data and introduces hypergraph items to save the geometric structure information of the original data. At the same time, the model decomposes the original tensor into low-order tensors and sparse tensors. The low-rank tensor was used to cluster cancer samples to verify the effectiveness of the method.


Sujet(s)
Algorithmes , Tumeurs , Analyse de regroupements , Génomique , Humains , Tumeurs/génétique , Analyse en composantes principales
6.
Math Biosci Eng ; 18(6): 8951-8961, 2021 10 18.
Article de Anglais | MEDLINE | ID: mdl-34814330

RÉSUMÉ

Proportion of cancerous cells in a tumor sample, known as "tumor purity", is a major source of confounding factor in cancer data analyses. Lots of computational methods are available for estimating tumor purity from different types of genomics data or based on different platforms, which makes it difficult to compare and integrate the estimated results. To rectify the deviation caused by tumor purity effect, a number of methods for downstream data analysis have been developed, including tumor sample clustering, association study and differential methylation between tumor samples. However, using these computational tools remains a daunting task for many researchers since they require non-trivial computational skills. To this end, we present Purimeth, an integrated web-based tool for estimating and accounting for tumor purity in cancer DNA methylation studies. Purimeth implements three state-of-the-art methods for tumor purity estimation from DNA methylation array data: InfiniumPurify, MEpurity and PAMES. It also provides graphical interface for various analyses including differential methylation (DM), sample clustering, and purification of tumor methylomes, all with the consideration of tumor purities. In addition, Purimeth catalogs estimated tumor purities for TCGA samples from nine methods for users to visualize and explore. In conclusion, Purimeth provides an easy-operated way for researchers to explore tumor purity and implement cancer methylation data analysis. It is developed using Shiny (Version 1.6.0) and freely available at http://purimeth.comp-epi.com/.


Sujet(s)
Méthylation de l'ADN , Tumeurs , Analyse de regroupements , Humains , Internet , Tumeurs/génétique
7.
J Bioinform Comput Biol ; 19(1): 2050047, 2021 02.
Article de Anglais | MEDLINE | ID: mdl-33410727

RÉSUMÉ

Non-negative Matrix Factorization (NMF) is a popular data dimension reduction method in recent years. The traditional NMF method has high sensitivity to data noise. In the paper, we propose a model called Sparse Robust Graph-regularized Non-negative Matrix Factorization based on Correntropy (SGNMFC). The maximized correntropy replaces the traditional minimized Euclidean distance to improve the robustness of the algorithm. Through the kernel function, correntropy can give less weight to outliers and noise in data but give greater weight to meaningful data. Meanwhile, the geometry structure of the high-dimensional data is completely preserved in the low-dimensional manifold through the graph regularization. Feature selection and sample clustering are commonly used methods for analyzing genes. Sparse constraints are applied to the loss function to reduce matrix complexity and analysis difficulty. Comparing the other five similar methods, the effectiveness of the SGNMFC model is proved by selection of differentially expressed genes and sample clustering experiments in three The Cancer Genome Atlas (TCGA) datasets.


Sujet(s)
Algorithmes , Biologie informatique/méthodes , Expression des gènes , Tumeurs/génétique , Analyse de regroupements , Infographie , Interprétation statistique de données , Bases de données génétiques , Régulation de l'expression des gènes tumoraux , Humains
8.
Front Genet ; 10: 1054, 2019.
Article de Anglais | MEDLINE | ID: mdl-31824556

RÉSUMÉ

Non-negative matrix factorization (NMF) is a matrix decomposition method based on the square loss function. To exploit cancer information, cancer gene expression data often uses the NMF method to reduce dimensionality. Gene expression data usually have some noise and outliers, while the original NMF loss function is very sensitive to non-Gaussian noise. To improve the robustness and clustering performance of the algorithm, we propose a sparse graph regularization NMF based on Huber loss model for cancer data analysis (Huber-SGNMF). Huber loss is a function between L 1-norm and L 2-norm that can effectively handle non-Gaussian noise and outliers. Taking into account the sparsity matrix and data geometry information, sparse penalty and graph regularization terms are introduced into the model to enhance matrix sparsity and capture data manifold structure. Before the experiment, we first analyzed the robustness of Huber-SGNMF and other models. Experiments on The Cancer Genome Atlas (TCGA) data have shown that Huber-SGNMF performs better than other most advanced methods in sample clustering and differentially expressed gene selection.

9.
Hum Hered ; 84(1): 47-58, 2019.
Article de Anglais | MEDLINE | ID: mdl-31466072

RÉSUMÉ

Principal component analysis (PCA) is a widely used method for evaluating low-dimensional data. Some variants of PCA have been proposed to improve the interpretation of the principal components (PCs). One of the most common methods is sparse PCA which aims at finding a sparse basis to improve the interpretability over the dense basis of PCA. However, the performances of these improved methods are still far from satisfactory because the data still contain redundant PCs. In this paper, a novel method called PCA based on graph Laplacian and double sparse constraints (GDSPCA) is proposed to improve the interpretation of the PCs and consider the internal geometry of the data. In detail, GDSPCA utilizes L2,1-norm and L1-norm regularization terms simultaneously to enforce the matrix to be sparse by filtering redundant and irrelative PCs, where the L2,1-norm regularization term can produce row sparsity, while the L1-norm regularization term can enforce element sparsity. This way, we can make a better interpretation of the new PCs in low-dimensional subspace. Meanwhile, the method of GDSPCA integrates graph Laplacian into PCA to explore the geometric structure hidden in the data. A simple and effective optimization solution is provided. Extensive experiments on multi-view biological data demonstrate the feasibility and effectiveness of the proposed approach.


Sujet(s)
Algorithmes , Analyse en composantes principales , Analyse de regroupements , Régulation de l'expression des gènes tumoraux , Humains , Tumeurs/génétique
10.
Int J Mol Sci ; 20(4)2019 Feb 18.
Article de Anglais | MEDLINE | ID: mdl-30781701

RÉSUMÉ

Feature selection and sample clustering play an important role in bioinformatics. Traditional feature selection methods separate sparse regression and embedding learning. Later, to effectively identify the significant features of the genomic data, Joint Embedding Learning and Sparse Regression (JELSR) is proposed. However, since there are many redundancy and noise values in genomic data, the sparseness of this method is far from enough. In this paper, we propose a strengthened version of JELSR by adding the L1-norm constraint on the regularization term based on a previous model, and call it LJELSR, to further improve the sparseness of the method. Then, we provide a new iterative algorithm to obtain the convergence solution. The experimental results show that our method achieves a state-of-the-art level both in identifying differentially expressed genes and sample clustering on different genomic data compared to previous methods. Additionally, the selected differentially expressed genes may be of great value in medical research.


Sujet(s)
Algorithmes , Analyse de regroupements , Tumeurs du côlon/génétique , Bases de données comme sujet , Tumeurs de l'oesophage/génétique , Analyse de profil d'expression de gènes , Humains , Analyse de régression
11.
Biol Proced Online ; 20: 5, 2018.
Article de Anglais | MEDLINE | ID: mdl-29507534

RÉSUMÉ

BACKGROUND: Hierarchical Sample clustering (HSC) is widely performed to examine associations within expression data obtained from microarrays and RNA sequencing (RNA-seq). Researchers have investigated the HSC results with several possible criteria for grouping (e.g., sex, age, and disease types). However, the evaluation of arbitrary defined groups still counts in subjective visual inspection. RESULTS: To objectively evaluate the degree of separation between groups of interest in the HSC dendrogram, we propose to use Silhouette scores. Silhouettes was originally developed as a graphical aid for the validation of data clusters. It provides a measure of how well a sample is classified when it was assigned to a cluster by according to both the tightness of the clusters and the separation between them. It ranges from 1.0 to - 1.0, and a larger value for the average silhouette (AS) over all samples to be analyzed indicates a higher degree of cluster separation. The basic idea to use an AS is to replace the term cluster by group when calculating the scores. We investigated the validity of this score using simulated and real data designed for differential expression (DE) analysis. We found that larger (or smaller) AS values agreed well with both higher (or lower) degrees of separation between different groups and higher percentages of differentially expressed genes (PDEG). We also found that the AS values were generally independent on the number of replicates (Nrep). Although the PDEG values depended on Nrep, we confirmed that both AS and PDEG values were close to zero when samples in the data showed an intermingled nature between the groups in the HSC dendrogram. CONCLUSION: Silhouettes is useful for exploring data with predefined group labels. It would help provide both an objective evaluation of HSC dendrograms and insights into the DE results with regard to the compared groups.

SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE