Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Nucleic Acids Res ; 50(D1): D391-D401, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34718747

RESUMO

Transcription co-factors (TcoFs) play crucial roles in gene expression regulation by communicating regulatory cues from enhancers to promoters. With the rapid accumulation of TcoF associated chromatin immunoprecipitation sequencing (ChIP-seq) data, the comprehensive collection and integrative analyses of these data are urgently required. Here, we developed the TcoFBase database (http://tcof.liclab.net/TcoFbase), which aimed to document a large number of available resources for mammalian TcoFs and provided annotations and enrichment analyses of TcoFs. TcoFBase curated 2322 TcoFs and 6759 TcoFs associated ChIP-seq data from over 500 tissues/cell types in human and mouse. Importantly, TcoFBase provided detailed and abundant (epi) genetic annotations of ChIP-seq based TcoF binding regions. Furthermore, TcoFBase supported regulatory annotation information and various functional annotations for TcoFs. Meanwhile, TcoFBase embedded five types of TcoF regulatory analyses for users, including TcoF gene set enrichment, TcoF binding genomic region annotation, TcoF regulatory network analysis, TcoF-TF co-occupancy analysis and TcoF regulatory axis analysis. TcoFBase was designed to be a useful resource that will help reveal the potential biological effects of TcoFs and elucidate TcoF-related regulatory mechanisms.


Assuntos
Bases de Dados Genéticas , Redes Reguladoras de Genes , Software , Fatores de Transcrição/genética , Transcrição Gênica , Animais , Cromatina/química , Cromatina/metabolismo , Conjuntos de Dados como Assunto , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas , Fatores de Transcrição/classificação , Fatores de Transcrição/metabolismo
2.
Nucleic Acids Res ; 49(W1): W317-W325, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-34086934

RESUMO

Gene set enrichment (GSE) analysis plays an essential role in extracting biological insight from genome-scale experiments. ORA (overrepresentation analysis), FCS (functional class scoring), and PT (pathway topology) approaches are three generations of GSE methods along the timeline of development. Previous versions of KOBAS provided services based on just the ORA method. Here we presented version 3.0 of KOBAS, which is named KOBAS-i (short for KOBAS intelligent version). It introduced a novel machine learning-based method we published earlier, CGPS, which incorporates seven FCS tools and two PT tools into a single ensemble score and intelligently prioritizes the relevant biological pathways. In addition, KOBAS has expanded the downstream exploratory visualization for selecting and understanding the enriched results. The tool constructs a novel view of cirFunMap, which presents different enriched terms and their correlations in a landscape. Finally, based on the previous version's framework, KOBAS increased the number of supported species from 1327 to 5944. For an easier local run, it also provides a prebuilt Docker image that requires no installation, as a supplementary to the source code version. KOBAS can be freely accessed at http://kobas.cbi.pku.edu.cn, and a mirror site is available at http://bioinfo.org/kobas.


Assuntos
Genes , Software , Expressão Gênica , Ontologia Genética , Aprendizado de Máquina , Proteínas/genética
3.
Nucleic Acids Res ; 49(D1): D1197-D1206, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33264402

RESUMO

Pharmacotranscriptomics has become a powerful approach for evaluating the therapeutic efficacy of drugs and discovering new drug targets. Recently, studies of traditional Chinese medicine (TCM) have increasingly turned to high-throughput transcriptomic screens for molecular effects of herbs/ingredients. And numerous studies have examined gene targets for herbs/ingredients, and link herbs/ingredients to various modern diseases. However, there is currently no systematic database organizing these data for TCM. Therefore, we built HERB, a high-throughput experiment- and reference-guided database of TCM, with its Chinese name as BenCaoZuJian. We re-analyzed 6164 gene expression profiles from 1037 high-throughput experiments evaluating TCM herbs/ingredients, and generated connections between TCM herbs/ingredients and 2837 modern drugs by mapping the comprehensive pharmacotranscriptomics dataset in HERB to CMap, the largest such dataset for modern drugs. Moreover, we manually curated 1241 gene targets and 494 modern diseases for 473 herbs/ingredients from 1966 references published recently, and cross-referenced this novel information to databases containing such data for drugs. Together with database mining and statistical inference, we linked 12 933 targets and 28 212 diseases to 7263 herbs and 49 258 ingredients and provided six pairwise relationships among them in HERB. In summary, HERB will intensively support the modernization of TCM and guide rational modern drug discovery efforts. And it is accessible through http://herb.ac.cn/.


Assuntos
Bases de Dados Factuais , Medicamentos de Ervas Chinesas/uso terapêutico , Medicina Tradicional Chinesa/métodos , Farmacogenética/métodos , Software , Animais , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Medicamentos de Ervas Chinesas/química , Ensaios de Triagem em Larga Escala , Humanos , Internet , Camundongos , Terapia de Alvo Molecular/métodos , Extratos Vegetais/química , Extratos Vegetais/uso terapêutico , Transcriptoma
4.
Nucleic Acids Res ; 49(D1): D165-D171, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33196801

RESUMO

NONCODE (http://www.noncode.org/) is a comprehensive database of collection and annotation of noncoding RNAs, especially long non-coding RNAs (lncRNAs) in animals. NONCODEV6 is dedicated to providing the full scope of lncRNAs across plants and animals. The number of lncRNAs in NONCODEV6 has increased from 548 640 to 644 510 since the last update in 2017. The number of human lncRNAs has increased from 172 216 to 173 112. The number of mouse lncRNAs increased from 131 697 to 131 974. The number of plant lncRNAs is 94 697. The relationship between lncRNAs in human and cancer were updated with transcriptome sequencing profiles. Three important new features were also introduced in NONCODEV6: (i) updated human lncRNA-disease relationships, especially cancer; (ii) lncRNA annotations with tissue expression profiles and predicted function in five common plants; iii) lncRNAs conservation annotation at transcript level for 23 plant species. NONCODEV6 is accessible through http://www.noncode.org/.


Assuntos
Bases de Dados de Ácidos Nucleicos , Neoplasias/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , Software , Transcriptoma , Animais , Sequência de Bases , Sequência Conservada , Éxons , Perfilação da Expressão Gênica , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Neoplasias/classificação , Neoplasias/metabolismo , Neoplasias/patologia , Plantas/genética , RNA Longo não Codificante/classificação , RNA Longo não Codificante/metabolismo , RNA Mensageiro/classificação , RNA Mensageiro/metabolismo
5.
Nucleic Acids Res ; 47(W1): W516-W522, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31147700

RESUMO

As more and more high-throughput data has been produced by next-generation sequencing, it is still a challenge to classify RNA transcripts into protein-coding or non-coding, especially for poorly annotated species. We upgraded our original coding potential calculator, CNCI (Coding-Non-Coding Index), to CNIT (Coding-Non-Coding Identifying Tool), which provides faster and more accurate evaluation of the coding ability of RNA transcripts. CNIT runs âˆ¼200 times faster than CNCI and exhibits more accuracy compared with CNCI (0.98 versus 0.94 for human, 0.95 versus 0.93 for mouse, 0.93 versus 0.92 for zebrafish, 0.93 versus 0.92 for fruit fly, 0.92 versus 0.88 for worm, and 0.98 versus 0.85 for Arabidopsis transcripts). Moreover, the AUC values of 11 animal species and 27 plant species showed that CNIT was capable of obtaining relatively accurate identification results for almost all eukaryotic transcripts. In addition, a mobile-friendly web server is now freely available at http://cnit.noncode.org/CNIT.


Assuntos
Proteínas/genética , RNA Longo não Codificante/química , Análise de Sequência de RNA , Software , Animais , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Camundongos , Molécula L1 de Adesão de Célula Nervosa/genética
6.
Nucleic Acids Res ; 47(D1): D1110-D1117, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30380087

RESUMO

Recently, the pharmaceutical industry has heavily emphasized phenotypic drug discovery (PDD), which relies primarily on knowledge about phenotype changes associated with diseases. Traditional Chinese medicine (TCM) provides a massive amount of information on natural products and the clinical symptoms they are used to treat, which are the observable disease phenotypes that are crucial for clinical diagnosis and treatment. Curating knowledge of TCM symptoms and their relationships to herbs and diseases will provide both candidate leads and screening directions for evidence-based PDD programs. Therefore, we present SymMap, an integrative database of traditional Chinese medicine enhanced by symptom mapping. We manually curated 1717 TCM symptoms and related them to 499 herbs and 961 symptoms used in modern medicine based on a committee of 17 leading experts practicing TCM. Next, we collected 5235 diseases associated with these symptoms, 19 595 herbal constituents (ingredients) and 4302 target genes, and built a large heterogeneous network containing all of these components. Thus, SymMap integrates TCM with modern medicine in common aspects at both the phenotypic and molecular levels. Furthermore, we inferred all pairwise relationships among SymMap components using statistical tests to give pharmaceutical scientists the ability to rank and filter promising results to guide drug discovery. The SymMap database can be accessed at http://www.symmap.org/ and https://www.bioinfo.org/symmap.


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Medicamentos de Ervas Chinesas/uso terapêutico , Medicina Tradicional Chinesa/métodos , Terapia de Alvo Molecular/métodos , Redes Reguladoras de Genes/efeitos dos fármacos , Redes Reguladoras de Genes/genética , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Medicina Tradicional Chinesa/estatística & dados numéricos , Fitoterapia/métodos
7.
J Cell Physiol ; 235(4): 3569-3578, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31556110

RESUMO

Studies have shown that microRNAs (miRNAs) play a vital role in tumor progression and patients' prognosis. Therefore, we aimed to construct a miRNA model for forecasting the survival of hepatocellular carcinoma (HCC) patients. The gene expression data of 433 patients with HCC from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus public databases were remined by survival analysis and receptor manipulation characteristic curve (ROC). A prognostic model including six miRNAs (hsa-mir-26a-1-3p, hsa-mir-188-5p, hsa-mir-212-5p, hsa-mir-149-5p, hsa-mir-105-5p, and hsa-mir-132-5p) were constructed in the training dataset (TCGA, n = 333). HCC patients were stratified into a high-risk group and a low-risk group with significantly different survival (median: 2.75 vs. 8.93 years, log-rank test p < .001). Then we proved its performance of stratification in another independent dataset (GSE116182, median: 2.55 vs 6.96 years, log-rank test p = .008). Cox regression analysis showed that the prognostic model was an independent prognostic indicator for HCC patients. Then time-dependent ROC analyses were performed to test the prognostic ability of the model with that of TNM staging, we found the model had a better performance, especially at 5 years (AUC = 0.76). Functional prediction showed that the genes targeted by the six prognostic miRNAs in the prognostic model were highly expressed in the P53-related pathway. In conclusion, we constructed a prognostic miRNA model that could indicate the survival of HCC patients.


Assuntos
Carcinoma Hepatocelular/genética , Neoplasias Hepáticas/genética , MicroRNAs/genética , Proteína Supressora de Tumor p53/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Biomarcadores Tumorais , Carcinoma Hepatocelular/patologia , Intervalo Livre de Doença , Feminino , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Estimativa de Kaplan-Meier , Neoplasias Hepáticas/patologia , Masculino , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Prognóstico , Fatores de Risco , Transcriptoma/genética , Adulto Jovem
8.
Nucleic Acids Res ; 46(D1): D308-D314, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29140524

RESUMO

NONCODE (http://www.bioinfo.org/noncode/) is a systematic database that is dedicated to presenting the most complete collection and annotation of non-coding RNAs (ncRNAs), especially long non-coding RNAs (lncRNAs). Since NONCODE 2016 was released two years ago, the amount of novel identified ncRNAs has been enlarged by the reduced cost of next-generation sequencing, which has produced an explosion of newly identified data. The third-generation sequencing revolution has also offered longer and more accurate annotations. Moreover, accumulating evidence confirmed by biological experiments has provided more comprehensive knowledge of lncRNA functions. The ncRNA data set was expanded by collecting newly identified ncRNAs from literature published over the past two years and integration of the latest versions of RefSeq and Ensembl. Additionally, pig was included in the database for the first time, bringing the total number of species to 17. The number of lncRNAs in NONCODEv5 increased from 527 336 to 548 640. NONCODEv5 also introduced three important new features: (i) human lncRNA-disease relationships and single nucleotide polymorphism-lncRNA-disease relationships were constructed; (ii) human exosome lncRNA expression profiles were displayed; (iii) the RNA secondary structures of NONCODE human transcripts were predicted. NONCODEv5 is also accessible through http://www.noncode.org/.


Assuntos
Bases de Dados Genéticas , Anotação de Sequência Molecular , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Animais , Doença/genética , Exossomos/genética , Exossomos/metabolismo , Perfilação da Expressão Gênica , Humanos , Camundongos , Conformação de Ácido Nucleico , Polimorfismo de Nucleotídeo Único , RNA Longo não Codificante/química
9.
Brief Bioinform ; 18(5): 789-797, 2017 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27439532

RESUMO

RNA-seq technology offers the promise of rapid comprehensive discovery of long intervening noncoding RNAs (lincRNAs). Basic tools such as Tophat and Cufflinks have been widely used for RNA-seq assembly. However, advanced bioinformatics methodologies that allow in-depth analysis of lincRNAs are lacking. Here, we describe a computational protocol that is especially designed for the identification of novel lincRNAs and the prediction of the function. The protocol mainly includes two open-access tools, CNCI and ncFANs. CNCI allows users to distinguish noncoding from protein-coding transcripts and to retrieve novel lincRNAs. ncFANs integrates expression profiles of protein-coding and lincRNA genes to construct coexpression networks. Such networks are subsequently used to perform function predictions of unknown lincRNAs. This protocol will allow users to apply these procedures without the need of additional training. All the tools in current protocol are available http://www.bioinfo.org/np/.


Assuntos
RNA Longo não Codificante/genética , Biologia Computacional , Proteínas
10.
Nucleic Acids Res ; 44(D1): D203-8, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26586799

RESUMO

NONCODE (http://www.bioinfo.org/noncode/) is an interactive database that aims to present the most complete collection and annotation of non-coding RNAs, especially long non-coding RNAs (lncRNAs). The recently reduced cost of RNA sequencing has produced an explosion of newly identified data. Revolutionary third-generation sequencing methods have also contributed to more accurate annotations. Accumulative experimental data also provides more comprehensive knowledge of lncRNA functions. In this update, NONCODE has added six new species, bringing the total to 16 species altogether. The lncRNAs in NONCODE have increased from 210 831 to 527,336. For human and mouse, the lncRNA numbers are 167,150 and 130,558, respectively. NONCODE 2016 has also introduced three important new features: (i) conservation annotation; (ii) the relationships between lncRNAs and diseases; and (iii) an interface to choose high-quality datasets through predicted scores, literature support and long-read sequencing method support. NONCODE is also accessible through http://www.noncode.org/.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA Longo não Codificante/genética , Animais , Sequência de Bases , Bovinos , Sequência Conservada , Doença/genética , Humanos , Camundongos , Anotação de Sequência Molecular , RNA Longo não Codificante/metabolismo , Ratos
11.
GigaByte ; 2024: gigabyte108, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38434931

RESUMO

As genomic sequencing technology continues to advance, it becomes increasingly important to perform joint analyses of multiple datasets of transcriptomics. However, batch effect presents challenges for dataset integration, such as sequencing data measured on different platforms, and datasets collected at different times. Here, we report the development of BatchEval Pipeline, a batch effect workflow used to evaluate batch effect on dataset integration. The BatchEval Pipeline generates a comprehensive report, which consists of a series of HTML pages for assessment findings, including a main page, a raw dataset evaluation page, and several built-in methods evaluation pages. The main page exhibits basic information of the integrated datasets, a comprehensive score of batch effect, and the most recommended method for removing batch effect from the current datasets. The remaining pages exhibit evaluation details for the raw dataset, and evaluation results from the built-in batch effect removal methods after removing batch effect. This comprehensive report enables researchers to accurately identify and remove batch effects, resulting in more reliable and meaningful biological insights from integrated datasets. In summary, the BatchEval Pipeline represents a significant advancement in batch effect evaluation, and is a valuable tool to improve the accuracy and reliability of the experimental results. Availability & Implementation: The source code of the BatchEval Pipeline is available at https://github.com/STOmics/BatchEval.

12.
GigaByte ; 2024: gigabyte109, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38440167

RESUMO

This paper introduces a new approach to cell clustering using the Variable Neighborhood Search (VNS) metaheuristic. The purpose of this method is to cluster cells based on both gene expression and spatial coordinates. Initially, we confronted this clustering challenge as an Integer Linear Programming minimization problem. Our approach introduced a novel model based on the VNS technique, demonstrating the efficacy in navigating the complexities of cell clustering. Notably, our method extends beyond conventional cell-type clustering to spatial domain clustering. This adaptability enables our algorithm to orchestrate clusters based on information gleaned from gene expression matrices and spatial coordinates. Our validation showed the superior performance of our method when compared to existing techniques. Our approach advances current clustering methodologies and can potentially be applied to several fields, from biomedical research to spatial data analysis.

13.
Gigascience ; 13(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38373746

RESUMO

BACKGROUND: The emergence of high-resolved spatial transcriptomics (ST) has facilitated the research of novel methods to investigate biological development, organism growth, and other complex biological processes. However, high-resolved and whole transcriptomics ST datasets require customized imputation methods to improve the signal-to-noise ratio and the data quality. FINDINGS: We propose an efficient and adaptive Gaussian smoothing (EAGS) imputation method for high-resolved ST. The adaptive 2-factor smoothing of EAGS creates patterns based on the spatial and expression information of the cells, creates adaptive weights for the smoothing of cells in the same pattern, and then utilizes the weights to restore the gene expression profiles. We assessed the performance and efficiency of EAGS using simulated and high-resolved ST datasets of mouse brain and olfactory bulb. CONCLUSIONS: Compared with other competitive methods, EAGS shows higher clustering accuracy, better biological interpretations, and significantly reduced computational consumption.


Assuntos
Imageamento por Ressonância Magnética , Transcriptoma , Animais , Camundongos , Imageamento por Ressonância Magnética/métodos , Perfilação da Expressão Gênica , Distribuição Normal , Razão Sinal-Ruído
14.
Gigascience ; 132024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-39028588

RESUMO

BACKGROUND: Integrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times. FINDINGS: We propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections. It enables the joint downstream analysis of multiple datasets not only in low-dimensional embeddings but also in the reconstructed full expression space. CONCLUSIONS: In benchmarking analysis, spatiAlign outperforms state-of-the-art methods in learning joint and discriminative representations for tissue sections, each potentially characterized by complex batch effects or distinct biological characteristics. Furthermore, we demonstrate the benefits of spatiAlign for the integrative analysis of time-series brain sections, including spatial clustering, differential expression analysis, and particularly trajectory inference that requires a corrected gene expression matrix.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Aprendizado de Máquina não Supervisionado , Perfilação da Expressão Gênica/métodos , Biologia Computacional/métodos , Humanos , Algoritmos , Animais , Análise por Conglomerados , Encéfalo/metabolismo
15.
Nat Commun ; 15(1): 7806, 2024 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-39242563

RESUMO

Three-dimensional Spatial Transcriptomics has revolutionized our understanding of tissue regionalization, organogenesis, and development. However, existing approaches overlook either spatial information or experiment-induced distortions, leading to significant discrepancies between reconstruction results and in vivo cell locations, causing unreliable downstream analysis. To address these challenges, we propose ST-GEARS (Spatial Transcriptomics GEospatial profile recovery system through AnchoRS). By employing innovative Distributive Constraints into the Optimization scheme, ST-GEARS retrieves anchors with exceeding precision that connect closest spots across sections in vivo. Guided by the anchors, it first rigidly aligns sections, next solves and denoises Elastic Fields to counteract distortions. Through mathematically proved Bi-sectional Fields Application, it eventually recovers the original spatial profile. Studying ST-GEARS across number of sections, sectional distances and sequencing platforms, we observed its outstanding performance on tissue, cell, and gene levels. ST-GEARS provides precise and well-explainable 'gears' between in vivo situations and in vitro analysis, powerfully fueling potential of biological discoveries.


Assuntos
Transcriptoma , Animais , Imageamento Tridimensional/métodos , Camundongos , Perfilação da Expressão Gênica/métodos , Humanos , Algoritmos
16.
GigaByte ; 2024: gigabyte111, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38434930

RESUMO

The basic analysis steps of spatial transcriptomics require obtaining gene expression information from both space and cells. The existing tools for these analyses incur performance issues when dealing with large datasets. These issues involve computationally intensive spatial localization, RNA genome alignment, and excessive memory usage in large chip scenarios. These problems affect the applicability and efficiency of the analysis. Here, a high-performance and accurate spatial transcriptomics data analysis workflow, called Stereo-seq Analysis Workflow (SAW), was developed for the Stereo-seq technology developed at BGI. SAW includes mRNA spatial position reconstruction, genome alignment, gene expression matrix generation, and clustering. The workflow outputs files in a universal format for subsequent personalized analysis. The execution time for the entire analysis is ∼148 min with 1 GB reads 1 × 1 cm chip test data, 1.8 times faster than with an unoptimized workflow.

17.
Gigascience ; 13(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38373745

RESUMO

BACKGROUND: Cell clustering is a pivotal aspect of spatial transcriptomics (ST) data analysis as it forms the foundation for subsequent data mining. Recent advances in spatial domain identification have leveraged graph neural network (GNN) approaches in conjunction with spatial transcriptomics data. However, such GNN-based methods suffer from representation collapse, wherein all spatial spots are projected onto a singular representation. Consequently, the discriminative capability of individual representation feature is limited, leading to suboptimal clustering performance. RESULTS: To address this issue, we proposed SGAE, a novel framework for spatial domain identification, incorporating the power of the Siamese graph autoencoder. SGAE mitigates the information correlation at both sample and feature levels, thus improving the representation discrimination. We adapted this framework to ST analysis by constructing a graph based on both gene expression and spatial information. SGAE outperformed alternative methods by its effectiveness in capturing spatial patterns and generating high-quality clusters, as evaluated by the Adjusted Rand Index, Normalized Mutual Information, and Fowlkes-Mallows Index. Moreover, the clustering results derived from SGAE can be further utilized in the identification of 3-dimensional (3D) Drosophila embryonic structure with enhanced accuracy. CONCLUSIONS: Benchmarking results from various ST datasets generated by diverse platforms demonstrate compelling evidence for the effectiveness of SGAE against other ST clustering methods. Specifically, SGAE exhibits potential for extension and application on multislice 3D reconstruction and tissue structure investigation. The source code and a collection of spatial clustering results can be accessed at https://github.com/STOmics/SGAE/.


Assuntos
Benchmarking , Perfilação da Expressão Gênica , Animais , Análise por Conglomerados , Mineração de Dados , Drosophila/genética
18.
GigaByte ; 2024: gigabyte110, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38434932

RESUMO

In spatially resolved transcriptomics, Stereo-seq facilitates the analysis of large tissues at the single-cell level, offering subcellular resolution and centimeter-level field-of-view. Our previous work on StereoCell introduced a one-stop software using cell nuclei staining images and statistical methods to generate high-confidence single-cell spatial gene expression profiles for Stereo-seq data. With advancements allowing the acquisition of cell boundary information, such as cell membrane/wall staining images, we updated our software to a new version, STCellbin. Using cell nuclei staining images, STCellbin aligns cell membrane/wall staining images with spatial gene expression maps. Advanced cell segmentation ensures the detection of accurate cell boundaries, leading to more reliable single-cell spatial gene expression profiles. We verified that STCellbin can be applied to mouse liver (cell membranes) and Arabidopsis seed (cell walls) datasets, outperforming other methods. The improved capability of capturing single-cell gene expression profiles results in a deeper understanding of the contribution of single-cell phenotypes to tissue biology. Availability & Implementation: The source code of STCellbin is available at https://github.com/STOmics/STCellbin.

19.
Genomics Proteomics Bioinformatics ; 21(1): 24-47, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36252814

RESUMO

The development of spatial transcriptomics (ST) technologies has transformed genetic research from a single-cell data level to a two-dimensional spatial coordinate system and facilitated the study of the composition and function of various cell subsets in different environments and organs. The large-scale data generated by these ST technologies, which contain spatial gene expression information, have elicited the need for spatially resolved approaches to meet the requirements of computational and biological data interpretation. These requirements include dealing with the explosive growth of data to determine the cell-level and gene-level expression, correcting the inner batch effect and loss of expression to improve the data quality, conducting efficient interpretation and in-depth knowledge mining both at the single-cell and tissue-wide levels, and conducting multi-omics integration analysis to provide an extensible framework toward the in-depth understanding of biological processes. However, algorithms designed specifically for ST technologies to meet these requirements are still in their infancy. Here, we review computational approaches to these problems in light of corresponding issues and challenges, and present forward-looking insights into algorithm development.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Algoritmos , Multiômica
20.
Front Oncol ; 11: 644443, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33768004

RESUMO

Background: Molecular characteristics can be good indicators of tumor prognosis and have been introduced into the classification of gliomas. The prognosis of patients with newly classified lower-grade gliomas (LGGs, including grade 2 and grade 3 gliomas) is highly heterogeneous, and new molecular markers are urgently needed. Methods: Autophagy related genes (ATGs) were obtained from Human Autophagy Database (HADb). From the Cancer Genome Atlas (TCGA) and the Chinese Glioma Genome Atlas (CGGA), gene expression profiles including ATG expression information and patient clinical data were downloaded. Cox regression analysis, receiver operating characteristic (ROC) analysis, Kaplan-Meier analysis, random survival forest algorithm (RSFVH) and stratification analysis were performed. Results: Through univariate Cox regression analysis, we found a total of 127 ATGs associated with the prognosis of LGG patients from TCGA dataset and a total of 131 survival-related ATGs from CGGA dataset. Using TCGA dataset as the training group (n = 524), we constructed a five-ATG signature (including BAG1, BID, MAP1LC3C, NRG3, PTK6), which could divide LGG patients into two risk groups with significantly different overall survival (Log Rank P < 0.001). Then we confirmed in the independent CGGA dataset that the five-ATG signature had the ability to predict prognosis (n = 431, Log Rank P < 0.001). We further discovered that the predictive ability of the five-ATG signature was better than the existing clinical indicators and IDH mutation status. In addition, the five-ATG signature could further classify patients after receiving radiotherapy or chemotherapy into groups with different prognosis. Conclusions: We identified a five-ATG signature that could be a reliable prognostic marker and might be therapeutic targets for autophagy therapy for LGG patients.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa