RESUMO
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular heterogeneity through high-throughput analysis of individual cells. Nevertheless, challenges arise from prevalent sequencing dropout events and noise effects, impacting subsequent analyses. Here, we introduce a novel algorithm, Single-cell Gene Importance Ranking (scGIR), which utilizes a single-cell gene correlation network to evaluate gene importance. The algorithm transforms single-cell sequencing data into a robust gene correlation network through statistical independence, with correlation edges weighted by gene expression levels. We then constructed a random walk model on the resulting weighted gene correlation network to rank the importance of genes. Our analysis of gene importance using PageRank algorithm across nine authentic scRNA-seq datasets indicates that scGIR can effectively surmount technical noise, enabling the identification of cell types and inference of developmental trajectories. We demonstrated that the edges of gene correlation, weighted by expression, play a critical role in enhancing the algorithm's performance. Our findings emphasize that scGIR outperforms in enhancing the clustering of cell subtypes, reverse identifying differentially expressed marker genes, and uncovering genes with potential differential importance. Overall, we proposed a promising method capable of extracting more information from single-cell RNA sequencing datasets, potentially shedding new lights on cellular processes and disease mechanisms.
Assuntos
Redes Reguladoras de Genes , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodosRESUMO
As the fundamental unit of a gene and its transcripts, nucleotides have enormous impacts on the gene function and evolution, and thus on phenotypes and diseases. In order to identify the key nucleotides of one specific gene, it is quite crucial to quantitatively measure the importance of each base on the gene. However, there are still no sequence-based methods of doing that. Here, we proposed Base Importance Calculator (BIC), an algorithm to calculate the importance score of each single base based on sequence information of human mRNAs and long noncoding RNAs (lncRNAs). We then confirmed its power by applying BIC to three different tasks. Firstly, we revealed that BIC can effectively evaluate the pathogenicity of both genes and single bases through single nucleotide variations. Moreover, the BIC score in The Cancer Genome Atlas somatic mutations is able to predict the prognosis of some cancers. Finally, we show that BIC can also precisely predict the transmissibility of SARS-CoV-2. The above results indicate that BIC is a useful tool for evaluating the single base importance of human mRNAs and lncRNAs.
Assuntos
COVID-19 , RNA Longo não Codificante , Humanos , COVID-19/genética , RNA Longo não Codificante/genética , SARS-CoV-2/genética , Algoritmos , Nucleotídeos , RNA Mensageiro/genéticaRESUMO
Potato is one of the four most important food crop plants worldwide and is strongly affected by drought. The following two pairs of potato cultivars, which are related in ancestry but show different drought tolerances, were chosen for comparative gene expression studies: Gwiazda/Oberon and Tajfun/Owacja. Comparative RNA-seq analyses of gene expression differences in the transcriptomes obtained from drought-tolerant versus drought-sensitive plants during water shortage conditions were performed. The 23 top-ranking genes were selected, 22 of which are described here as novel potato drought-responsive genes. Moreover, all but one of the potato genes selected have homologues in the Arabidopsis genome. Of the seven tested A. thaliana mutants with altered expression of the selected homologous genes, compared to the wild-type Arabidopsis plants, six showed an improved tolerance to drought. These genes encode carbohydrate transporter, mitogen-activated protein kinase kinase kinase 15 (MAPKKK15), serine carboxypeptidase-like 19 protein (SCPL19), armadillo/beta-catenin-like repeat-containing protein, high-affinity nitrate transporter 2.7 and nonspecific lipid transfer protein type 2 (nsLPT). The evolutionary conservation of the functions of the selected genes in the plant response to drought confirms the importance of these identified potato genes in the ability of plants to cope with water shortage conditions. Knowledge regarding these gene functions can be used to generate potato cultivars that are resistant to unfavourable conditions. The approach used in this work and the obtained results allowed for the identification of new players in the plant response to drought.
Assuntos
Secas , Solanum tuberosum/metabolismo , Solanum tuberosum/fisiologia , Arabidopsis/genética , Arabidopsis/microbiologia , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Regulação da Expressão Gênica de Plantas , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plantas Geneticamente Modificadas/genética , Plantas Geneticamente Modificadas/metabolismo , Plantas Geneticamente Modificadas/fisiologia , Solanum tuberosum/genéticaRESUMO
BACKGROUND: Gene regulatory network (GRN) is a model that characterizes the complex relationships between genes and thereby provides an informatics environment to measure the importance of nodes. The evaluation of important nodes in a GRN can effectively refer to their functional implications severing as key players in particular biological processes, such as master regulator and driver gene. Currently, it is mainly based on network topological parameters and focuses only on evaluating a single node individually. However, genes and products play their functions by interacting with each other. It is worth noting that the effects of gene combinations in GRN are not simply additive. Key combinations discovery is of significance in revealing gene sets with important functions. Recently, with the development of single-cell RNA-sequencing (scRNA-seq) technology, we can quantify gene expression profiles of individual cells that provide the potential to identify crucial nodes in gene regulations regarding specific condition, e.g., stem cell differentiation. RESULTS: In this paper, we propose a bioinformatics method, called Pseudo Knockout Importance (PKI), to quantify the importance of node and node sets in a specific GRN structure using time-course scRNA-seq data. First, we construct ordinary differential equations to approach the gene regulations during cell differentiation. Then we design gene pseudo knockout experiments and define PKI score evaluation criteria based on the coefficient of determination. The importance of nodes can be described as the influence on the ODE system of removing variables. For key gene combinations, PKI is derived as a combinatorial optimization problem of quantifying the in silico gene knockout effects. CONCLUSIONS: Here, we focus our analyses on the specific GRN of embryonic stem cells with time series gene expression profile. To verify the effectiveness and advantage of PKI method, we compare its node importance rankings with other twelve kinds of centrality-based methods, such as degree and Latora closeness. For key node combinations, we compare the results with the method based on minimum dominant set. Moreover, the famous combinations of transcription factors in induced pluripotent stem cell are also employed to verify the vital gene combinations identified by PKI. These results demonstrate the reliability and superiority of the proposed method.
Assuntos
Regulação da Expressão Gênica , Redes Reguladoras de Genes , Reprodutibilidade dos Testes , Biologia Computacional/métodos , Fatores de Transcrição/metabolismoRESUMO
The m6A modification has been implicated as an important epitranscriptomic marker, which plays extensive roles in the regulation of transcript stability, splicing, translation, and localization. Nevertheless, only some genes are repeatedly modified across various conditions and the principle of m6A regulation remains elusive. In this study, we performed a systems-level analysis of human genes frequently regulated by m6A modification (m6Afreq genes) and those occasionally regulated by m6A modification (m6Aocca genes). Compared to the m6Aocca genes, the m6Afreq genes exhibit gene importance-related features, such as lower dN/dS ratio, higher protein-protein interaction network degree, and reduced tissue expression specificity. Signaling network analysis indicates that the m6Afreq genes are associated with downstream components of signaling cascades, high-linked signaling adaptors, and specific network motifs like incoherent feed forward loops. Moreover, functional enrichment analysis indicates significant overlaps between the m6Afreq genes and genes involved in various layers of gene expression, such as being the microRNA targets and the regulators of RNA processing. Therefore, our findings suggest the potential interplay between m6A epitranscriptomic regulation and other gene expression regulatory machineries.
Assuntos
Adenosina/análogos & derivados , Regulação da Expressão Gênica , Adenosina/metabolismo , Redes Reguladoras de Genes , Humanos , MicroRNAs/metabolismo , Especificidade de Órgãos , Transdução de SinaisRESUMO
rlying biology of differentially expressed genes and proteins. Although various approaches have been proposed to identify cancer-related pathways, most of them only partially consider the influence of those differentially expressed genes, such as the gene numbers, their perturbation in the signaling transduction, and the interaction between genes. Signaling-pathway impact analysis (SPIA) provides a convenient framework which considers both the classical enrichment analysis and the actual perturbation on a given pathway. In this study, we extended previous proposed SPIA by incorporating the importance and specificity of genes (SPIA-IS). We applied this approach to six datasets for colorectal cancer, lung cancer, and pancreatic cancer. Results from these datasets showed that the proposed SPIA-IS could effectively improve the performance of the original SPIA in identifying cancer-related pathways.
Assuntos
Neoplasias Colorretais/genética , Biologia Computacional , Neoplasias Pulmonares/genética , Neoplasias Pancreáticas/genética , Transdução de Sinais/genética , Neoplasias Colorretais/metabolismo , Bases de Dados Genéticas , Redes Reguladoras de Genes , Humanos , Neoplasias Pulmonares/metabolismo , Neoplasias Pancreáticas/metabolismoRESUMO
A hierarchical clustering (HC) algorithm is one of the most widely used unsupervised statistical techniques for analyzing microarray gene expression data. When applying the HC algorithm to the gene expression data to cluster individuals, most of the HC algorithms generate clusters based on the highly differentially expressed (DE) genes that have very similar expression patterns. These highly DE genes may sometimes be irrelevant in biological processes. The serious problem is that those irrelevant genes with high expressions potentially drown out the low expressed genes that have important biological functions. To overcome the problem, Nowak and Tibshirani proposed the complementary hierarchical clustering (CHC) (Biostatistics, 9, 467-483, 2008). However, it is not robust against outlying expression and often produces misleading results if there exist some contaminations in the gene expression data. Thus, we propose the robust CHC (RCHC) method to robustify the CHC with respect to outliers by maximizing the ß-likelihood function for sequential extraction of a gene-set with proper groups of individuals. Note that the proposed method reduces to the CHC with the tuning parameter ß â 0. A value of ß plays a key role in the performance of the RCHC method, which controls the tradeoff between the robustness and efficiency of the estimators. Using simulation and real gene expression analysis, the RCHC method shows robust properties to gene expression clustering with respect to data contaminations, overcomes the problem of the CHC, and predicts critically important genes from breast cancer data.