Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters










Publication year range
1.
GigaByte ; 2024: gigabyte109, 2024.
Article in English | MEDLINE | ID: mdl-38440167

ABSTRACT

This paper introduces a new approach to cell clustering using the Variable Neighborhood Search (VNS) metaheuristic. The purpose of this method is to cluster cells based on both gene expression and spatial coordinates. Initially, we confronted this clustering challenge as an Integer Linear Programming minimization problem. Our approach introduced a novel model based on the VNS technique, demonstrating the efficacy in navigating the complexities of cell clustering. Notably, our method extends beyond conventional cell-type clustering to spatial domain clustering. This adaptability enables our algorithm to orchestrate clusters based on information gleaned from gene expression matrices and spatial coordinates. Our validation showed the superior performance of our method when compared to existing techniques. Our approach advances current clustering methodologies and can potentially be applied to several fields, from biomedical research to spatial data analysis.

2.
GigaByte ; 2024: gigabyte111, 2024.
Article in English | MEDLINE | ID: mdl-38434930

ABSTRACT

The basic analysis steps of spatial transcriptomics require obtaining gene expression information from both space and cells. The existing tools for these analyses incur performance issues when dealing with large datasets. These issues involve computationally intensive spatial localization, RNA genome alignment, and excessive memory usage in large chip scenarios. These problems affect the applicability and efficiency of the analysis. Here, a high-performance and accurate spatial transcriptomics data analysis workflow, called Stereo-seq Analysis Workflow (SAW), was developed for the Stereo-seq technology developed at BGI. SAW includes mRNA spatial position reconstruction, genome alignment, gene expression matrix generation, and clustering. The workflow outputs files in a universal format for subsequent personalized analysis. The execution time for the entire analysis is ∼148 min with 1 GB reads 1 × 1 cm chip test data, 1.8 times faster than with an unoptimized workflow.

3.
GigaByte ; 2024: gigabyte108, 2024.
Article in English | MEDLINE | ID: mdl-38434931

ABSTRACT

As genomic sequencing technology continues to advance, it becomes increasingly important to perform joint analyses of multiple datasets of transcriptomics. However, batch effect presents challenges for dataset integration, such as sequencing data measured on different platforms, and datasets collected at different times. Here, we report the development of BatchEval Pipeline, a batch effect workflow used to evaluate batch effect on dataset integration. The BatchEval Pipeline generates a comprehensive report, which consists of a series of HTML pages for assessment findings, including a main page, a raw dataset evaluation page, and several built-in methods evaluation pages. The main page exhibits basic information of the integrated datasets, a comprehensive score of batch effect, and the most recommended method for removing batch effect from the current datasets. The remaining pages exhibit evaluation details for the raw dataset, and evaluation results from the built-in batch effect removal methods after removing batch effect. This comprehensive report enables researchers to accurately identify and remove batch effects, resulting in more reliable and meaningful biological insights from integrated datasets. In summary, the BatchEval Pipeline represents a significant advancement in batch effect evaluation, and is a valuable tool to improve the accuracy and reliability of the experimental results. Availability & Implementation: The source code of the BatchEval Pipeline is available at https://github.com/STOmics/BatchEval.

4.
GigaByte ; 2024: gigabyte110, 2024.
Article in English | MEDLINE | ID: mdl-38434932

ABSTRACT

In spatially resolved transcriptomics, Stereo-seq facilitates the analysis of large tissues at the single-cell level, offering subcellular resolution and centimeter-level field-of-view. Our previous work on StereoCell introduced a one-stop software using cell nuclei staining images and statistical methods to generate high-confidence single-cell spatial gene expression profiles for Stereo-seq data. With advancements allowing the acquisition of cell boundary information, such as cell membrane/wall staining images, we updated our software to a new version, STCellbin. Using cell nuclei staining images, STCellbin aligns cell membrane/wall staining images with spatial gene expression maps. Advanced cell segmentation ensures the detection of accurate cell boundaries, leading to more reliable single-cell spatial gene expression profiles. We verified that STCellbin can be applied to mouse liver (cell membranes) and Arabidopsis seed (cell walls) datasets, outperforming other methods. The improved capability of capturing single-cell gene expression profiles results in a deeper understanding of the contribution of single-cell phenotypes to tissue biology. Availability & Implementation: The source code of STCellbin is available at https://github.com/STOmics/STCellbin.

5.
Gigascience ; 13(1)2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38373746

ABSTRACT

BACKGROUND: The emergence of high-resolved spatial transcriptomics (ST) has facilitated the research of novel methods to investigate biological development, organism growth, and other complex biological processes. However, high-resolved and whole transcriptomics ST datasets require customized imputation methods to improve the signal-to-noise ratio and the data quality. FINDINGS: We propose an efficient and adaptive Gaussian smoothing (EAGS) imputation method for high-resolved ST. The adaptive 2-factor smoothing of EAGS creates patterns based on the spatial and expression information of the cells, creates adaptive weights for the smoothing of cells in the same pattern, and then utilizes the weights to restore the gene expression profiles. We assessed the performance and efficiency of EAGS using simulated and high-resolved ST datasets of mouse brain and olfactory bulb. CONCLUSIONS: Compared with other competitive methods, EAGS shows higher clustering accuracy, better biological interpretations, and significantly reduced computational consumption.


Subject(s)
Magnetic Resonance Imaging , Transcriptome , Animals , Mice , Magnetic Resonance Imaging/methods , Gene Expression Profiling , Normal Distribution , Signal-To-Noise Ratio
6.
Gigascience ; 13(1)2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38373745

ABSTRACT

BACKGROUND: Cell clustering is a pivotal aspect of spatial transcriptomics (ST) data analysis as it forms the foundation for subsequent data mining. Recent advances in spatial domain identification have leveraged graph neural network (GNN) approaches in conjunction with spatial transcriptomics data. However, such GNN-based methods suffer from representation collapse, wherein all spatial spots are projected onto a singular representation. Consequently, the discriminative capability of individual representation feature is limited, leading to suboptimal clustering performance. RESULTS: To address this issue, we proposed SGAE, a novel framework for spatial domain identification, incorporating the power of the Siamese graph autoencoder. SGAE mitigates the information correlation at both sample and feature levels, thus improving the representation discrimination. We adapted this framework to ST analysis by constructing a graph based on both gene expression and spatial information. SGAE outperformed alternative methods by its effectiveness in capturing spatial patterns and generating high-quality clusters, as evaluated by the Adjusted Rand Index, Normalized Mutual Information, and Fowlkes-Mallows Index. Moreover, the clustering results derived from SGAE can be further utilized in the identification of 3-dimensional (3D) Drosophila embryonic structure with enhanced accuracy. CONCLUSIONS: Benchmarking results from various ST datasets generated by diverse platforms demonstrate compelling evidence for the effectiveness of SGAE against other ST clustering methods. Specifically, SGAE exhibits potential for extension and application on multislice 3D reconstruction and tissue structure investigation. The source code and a collection of spatial clustering results can be accessed at https://github.com/STOmics/SGAE/.


Subject(s)
Benchmarking , Gene Expression Profiling , Animals , Cluster Analysis , Data Mining , Drosophila/genetics
7.
Genomics Proteomics Bioinformatics ; 21(1): 24-47, 2023 02.
Article in English | MEDLINE | ID: mdl-36252814

ABSTRACT

The development of spatial transcriptomics (ST) technologies has transformed genetic research from a single-cell data level to a two-dimensional spatial coordinate system and facilitated the study of the composition and function of various cell subsets in different environments and organs. The large-scale data generated by these ST technologies, which contain spatial gene expression information, have elicited the need for spatially resolved approaches to meet the requirements of computational and biological data interpretation. These requirements include dealing with the explosive growth of data to determine the cell-level and gene-level expression, correcting the inner batch effect and loss of expression to improve the data quality, conducting efficient interpretation and in-depth knowledge mining both at the single-cell and tissue-wide levels, and conducting multi-omics integration analysis to provide an extensible framework toward the in-depth understanding of biological processes. However, algorithms designed specifically for ST technologies to meet these requirements are still in their infancy. Here, we review computational approaches to these problems in light of corresponding issues and challenges, and present forward-looking insights into algorithm development.


Subject(s)
Gene Expression Profiling , Transcriptome , Algorithms , Multiomics
8.
Nucleic Acids Res ; 50(D1): D391-D401, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34718747

ABSTRACT

Transcription co-factors (TcoFs) play crucial roles in gene expression regulation by communicating regulatory cues from enhancers to promoters. With the rapid accumulation of TcoF associated chromatin immunoprecipitation sequencing (ChIP-seq) data, the comprehensive collection and integrative analyses of these data are urgently required. Here, we developed the TcoFBase database (http://tcof.liclab.net/TcoFbase), which aimed to document a large number of available resources for mammalian TcoFs and provided annotations and enrichment analyses of TcoFs. TcoFBase curated 2322 TcoFs and 6759 TcoFs associated ChIP-seq data from over 500 tissues/cell types in human and mouse. Importantly, TcoFBase provided detailed and abundant (epi) genetic annotations of ChIP-seq based TcoF binding regions. Furthermore, TcoFBase supported regulatory annotation information and various functional annotations for TcoFs. Meanwhile, TcoFBase embedded five types of TcoF regulatory analyses for users, including TcoF gene set enrichment, TcoF binding genomic region annotation, TcoF regulatory network analysis, TcoF-TF co-occupancy analysis and TcoF regulatory axis analysis. TcoFBase was designed to be a useful resource that will help reveal the potential biological effects of TcoFs and elucidate TcoF-related regulatory mechanisms.


Subject(s)
Databases, Genetic , Gene Regulatory Networks , Software , Transcription Factors/genetics , Transcription, Genetic , Animals , Chromatin/chemistry , Chromatin/metabolism , Datasets as Topic , Enhancer Elements, Genetic , Gene Expression Regulation , Humans , Internet , Mice , Molecular Sequence Annotation , Promoter Regions, Genetic , Transcription Factors/classification , Transcription Factors/metabolism
9.
Nucleic Acids Res ; 49(W1): W317-W325, 2021 07 02.
Article in English | MEDLINE | ID: mdl-34086934

ABSTRACT

Gene set enrichment (GSE) analysis plays an essential role in extracting biological insight from genome-scale experiments. ORA (overrepresentation analysis), FCS (functional class scoring), and PT (pathway topology) approaches are three generations of GSE methods along the timeline of development. Previous versions of KOBAS provided services based on just the ORA method. Here we presented version 3.0 of KOBAS, which is named KOBAS-i (short for KOBAS intelligent version). It introduced a novel machine learning-based method we published earlier, CGPS, which incorporates seven FCS tools and two PT tools into a single ensemble score and intelligently prioritizes the relevant biological pathways. In addition, KOBAS has expanded the downstream exploratory visualization for selecting and understanding the enriched results. The tool constructs a novel view of cirFunMap, which presents different enriched terms and their correlations in a landscape. Finally, based on the previous version's framework, KOBAS increased the number of supported species from 1327 to 5944. For an easier local run, it also provides a prebuilt Docker image that requires no installation, as a supplementary to the source code version. KOBAS can be freely accessed at http://kobas.cbi.pku.edu.cn, and a mirror site is available at http://bioinfo.org/kobas.


Subject(s)
Genes , Software , Gene Expression , Gene Ontology , Machine Learning , Proteins/genetics
10.
Front Oncol ; 11: 644443, 2021.
Article in English | MEDLINE | ID: mdl-33768004

ABSTRACT

Background: Molecular characteristics can be good indicators of tumor prognosis and have been introduced into the classification of gliomas. The prognosis of patients with newly classified lower-grade gliomas (LGGs, including grade 2 and grade 3 gliomas) is highly heterogeneous, and new molecular markers are urgently needed. Methods: Autophagy related genes (ATGs) were obtained from Human Autophagy Database (HADb). From the Cancer Genome Atlas (TCGA) and the Chinese Glioma Genome Atlas (CGGA), gene expression profiles including ATG expression information and patient clinical data were downloaded. Cox regression analysis, receiver operating characteristic (ROC) analysis, Kaplan-Meier analysis, random survival forest algorithm (RSFVH) and stratification analysis were performed. Results: Through univariate Cox regression analysis, we found a total of 127 ATGs associated with the prognosis of LGG patients from TCGA dataset and a total of 131 survival-related ATGs from CGGA dataset. Using TCGA dataset as the training group (n = 524), we constructed a five-ATG signature (including BAG1, BID, MAP1LC3C, NRG3, PTK6), which could divide LGG patients into two risk groups with significantly different overall survival (Log Rank P < 0.001). Then we confirmed in the independent CGGA dataset that the five-ATG signature had the ability to predict prognosis (n = 431, Log Rank P < 0.001). We further discovered that the predictive ability of the five-ATG signature was better than the existing clinical indicators and IDH mutation status. In addition, the five-ATG signature could further classify patients after receiving radiotherapy or chemotherapy into groups with different prognosis. Conclusions: We identified a five-ATG signature that could be a reliable prognostic marker and might be therapeutic targets for autophagy therapy for LGG patients.

11.
Nucleic Acids Res ; 49(D1): D1197-D1206, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33264402

ABSTRACT

Pharmacotranscriptomics has become a powerful approach for evaluating the therapeutic efficacy of drugs and discovering new drug targets. Recently, studies of traditional Chinese medicine (TCM) have increasingly turned to high-throughput transcriptomic screens for molecular effects of herbs/ingredients. And numerous studies have examined gene targets for herbs/ingredients, and link herbs/ingredients to various modern diseases. However, there is currently no systematic database organizing these data for TCM. Therefore, we built HERB, a high-throughput experiment- and reference-guided database of TCM, with its Chinese name as BenCaoZuJian. We re-analyzed 6164 gene expression profiles from 1037 high-throughput experiments evaluating TCM herbs/ingredients, and generated connections between TCM herbs/ingredients and 2837 modern drugs by mapping the comprehensive pharmacotranscriptomics dataset in HERB to CMap, the largest such dataset for modern drugs. Moreover, we manually curated 1241 gene targets and 494 modern diseases for 473 herbs/ingredients from 1966 references published recently, and cross-referenced this novel information to databases containing such data for drugs. Together with database mining and statistical inference, we linked 12 933 targets and 28 212 diseases to 7263 herbs and 49 258 ingredients and provided six pairwise relationships among them in HERB. In summary, HERB will intensively support the modernization of TCM and guide rational modern drug discovery efforts. And it is accessible through http://herb.ac.cn/.


Subject(s)
Databases, Factual , Drugs, Chinese Herbal/therapeutic use , Medicine, Chinese Traditional/methods , Pharmacogenetics/methods , Software , Animals , Computational Biology/methods , Datasets as Topic , Drugs, Chinese Herbal/chemistry , High-Throughput Screening Assays , Humans , Internet , Mice , Molecular Targeted Therapy/methods , Plant Extracts/chemistry , Plant Extracts/therapeutic use , Transcriptome
12.
J Mol Diagn ; 23(3): 285-299, 2021 03.
Article in English | MEDLINE | ID: mdl-33346148

ABSTRACT

Next-generation sequencing is increasingly being adopted as a valuable method for the detection of somatic variants in clinical oncology. However, it is still challenging to reach a satisfactory level of robustness and standardization in clinical practice when using the currently available bioinformatics pipelines to detect variants from raw sequencing data. Moreover, appropriate reference data sets are lacking for clinical bioinformatics pipeline development, validation, and proficiency testing. Herein, we developed the Variant Benchmark tool (VarBen), an open-source software for variant simulation to generate customized reference data sets by directly editing the original sequencing reads. VarBen can introduce a variety of variants, including single-nucleotide variants, small insertions and deletions, and large structural variants, into targeted, exome, or whole-genome sequencing data, and can handle sequencing data from both the Illumina and Ion Torrent sequencing platforms. To demonstrate the feasibility and robustness of VarBen, we performed variant simulation on different sequencing data sets and compared the simulated variants with real-world data. The validation study showed that the simulated data are highly comparable to real-world data and that VarBen is a reliable tool for variant simulation. In addition, our collaborative study of somatic variant calling in 20 laboratories emphasizes the need for laboratories to evaluate their bioinformatics pipelines with customized reference data sets. VarBen may help users develop and validate their bioinformatics pipelines using locally generated sequencing data.


Subject(s)
Computational Biology/methods , Genetic Association Studies/methods , Genetic Predisposition to Disease , Genetic Variation , High-Throughput Nucleotide Sequencing , Software , Computational Biology/standards , Genetic Association Studies/standards , Genome-Wide Association Study/methods , Genome-Wide Association Study/standards , Humans , INDEL Mutation , Mutation , Polymorphism, Single Nucleotide , Reproducibility of Results
13.
Nucleic Acids Res ; 49(D1): D165-D171, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33196801

ABSTRACT

NONCODE (http://www.noncode.org/) is a comprehensive database of collection and annotation of noncoding RNAs, especially long non-coding RNAs (lncRNAs) in animals. NONCODEV6 is dedicated to providing the full scope of lncRNAs across plants and animals. The number of lncRNAs in NONCODEV6 has increased from 548 640 to 644 510 since the last update in 2017. The number of human lncRNAs has increased from 172 216 to 173 112. The number of mouse lncRNAs increased from 131 697 to 131 974. The number of plant lncRNAs is 94 697. The relationship between lncRNAs in human and cancer were updated with transcriptome sequencing profiles. Three important new features were also introduced in NONCODEV6: (i) updated human lncRNA-disease relationships, especially cancer; (ii) lncRNA annotations with tissue expression profiles and predicted function in five common plants; iii) lncRNAs conservation annotation at transcript level for 23 plant species. NONCODEV6 is accessible through http://www.noncode.org/.


Subject(s)
Databases, Nucleic Acid , Neoplasms/genetics , RNA, Long Noncoding/genetics , RNA, Messenger/genetics , Software , Transcriptome , Animals , Base Sequence , Conserved Sequence , Exons , Gene Expression Profiling , Humans , Internet , Mice , Molecular Sequence Annotation , Neoplasms/classification , Neoplasms/metabolism , Neoplasms/pathology , Plants/genetics , RNA, Long Noncoding/classification , RNA, Long Noncoding/metabolism , RNA, Messenger/classification , RNA, Messenger/metabolism
14.
J Cell Physiol ; 235(4): 3569-3578, 2020 04.
Article in English | MEDLINE | ID: mdl-31556110

ABSTRACT

Studies have shown that microRNAs (miRNAs) play a vital role in tumor progression and patients' prognosis. Therefore, we aimed to construct a miRNA model for forecasting the survival of hepatocellular carcinoma (HCC) patients. The gene expression data of 433 patients with HCC from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus public databases were remined by survival analysis and receptor manipulation characteristic curve (ROC). A prognostic model including six miRNAs (hsa-mir-26a-1-3p, hsa-mir-188-5p, hsa-mir-212-5p, hsa-mir-149-5p, hsa-mir-105-5p, and hsa-mir-132-5p) were constructed in the training dataset (TCGA, n = 333). HCC patients were stratified into a high-risk group and a low-risk group with significantly different survival (median: 2.75 vs. 8.93 years, log-rank test p < .001). Then we proved its performance of stratification in another independent dataset (GSE116182, median: 2.55 vs 6.96 years, log-rank test p = .008). Cox regression analysis showed that the prognostic model was an independent prognostic indicator for HCC patients. Then time-dependent ROC analyses were performed to test the prognostic ability of the model with that of TNM staging, we found the model had a better performance, especially at 5 years (AUC = 0.76). Functional prediction showed that the genes targeted by the six prognostic miRNAs in the prognostic model were highly expressed in the P53-related pathway. In conclusion, we constructed a prognostic miRNA model that could indicate the survival of HCC patients.


Subject(s)
Carcinoma, Hepatocellular/genetics , Liver Neoplasms/genetics , MicroRNAs/genetics , Tumor Suppressor Protein p53/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Biomarkers, Tumor , Carcinoma, Hepatocellular/pathology , Disease-Free Survival , Female , Gene Expression Regulation, Neoplastic/genetics , Humans , Kaplan-Meier Estimate , Liver Neoplasms/pathology , Male , Middle Aged , Neoplasm Staging , Prognosis , Risk Factors , Transcriptome/genetics , Young Adult
15.
Nucleic Acids Res ; 47(W1): W516-W522, 2019 07 02.
Article in English | MEDLINE | ID: mdl-31147700

ABSTRACT

As more and more high-throughput data has been produced by next-generation sequencing, it is still a challenge to classify RNA transcripts into protein-coding or non-coding, especially for poorly annotated species. We upgraded our original coding potential calculator, CNCI (Coding-Non-Coding Index), to CNIT (Coding-Non-Coding Identifying Tool), which provides faster and more accurate evaluation of the coding ability of RNA transcripts. CNIT runs âˆ¼200 times faster than CNCI and exhibits more accuracy compared with CNCI (0.98 versus 0.94 for human, 0.95 versus 0.93 for mouse, 0.93 versus 0.92 for zebrafish, 0.93 versus 0.92 for fruit fly, 0.92 versus 0.88 for worm, and 0.98 versus 0.85 for Arabidopsis transcripts). Moreover, the AUC values of 11 animal species and 27 plant species showed that CNIT was capable of obtaining relatively accurate identification results for almost all eukaryotic transcripts. In addition, a mobile-friendly web server is now freely available at http://cnit.noncode.org/CNIT.


Subject(s)
Proteins/genetics , RNA, Long Noncoding/chemistry , Sequence Analysis, RNA , Software , Animals , High-Throughput Nucleotide Sequencing , Humans , Internet , Mice , Neural Cell Adhesion Molecule L1/genetics
16.
Nucleic Acids Res ; 47(D1): D1110-D1117, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30380087

ABSTRACT

Recently, the pharmaceutical industry has heavily emphasized phenotypic drug discovery (PDD), which relies primarily on knowledge about phenotype changes associated with diseases. Traditional Chinese medicine (TCM) provides a massive amount of information on natural products and the clinical symptoms they are used to treat, which are the observable disease phenotypes that are crucial for clinical diagnosis and treatment. Curating knowledge of TCM symptoms and their relationships to herbs and diseases will provide both candidate leads and screening directions for evidence-based PDD programs. Therefore, we present SymMap, an integrative database of traditional Chinese medicine enhanced by symptom mapping. We manually curated 1717 TCM symptoms and related them to 499 herbs and 961 symptoms used in modern medicine based on a committee of 17 leading experts practicing TCM. Next, we collected 5235 diseases associated with these symptoms, 19 595 herbal constituents (ingredients) and 4302 target genes, and built a large heterogeneous network containing all of these components. Thus, SymMap integrates TCM with modern medicine in common aspects at both the phenotypic and molecular levels. Furthermore, we inferred all pairwise relationships among SymMap components using statistical tests to give pharmaceutical scientists the ability to rank and filter promising results to guide drug discovery. The SymMap database can be accessed at http://www.symmap.org/ and https://www.bioinfo.org/symmap.


Subject(s)
Computational Biology/methods , Databases, Factual , Drugs, Chinese Herbal/therapeutic use , Medicine, Chinese Traditional/methods , Molecular Targeted Therapy/methods , Gene Regulatory Networks/drug effects , Gene Regulatory Networks/genetics , Humans , Information Storage and Retrieval/methods , Internet , Medicine, Chinese Traditional/statistics & numerical data , Phytotherapy/methods
17.
Nucleic Acids Res ; 46(D1): D308-D314, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29140524

ABSTRACT

NONCODE (http://www.bioinfo.org/noncode/) is a systematic database that is dedicated to presenting the most complete collection and annotation of non-coding RNAs (ncRNAs), especially long non-coding RNAs (lncRNAs). Since NONCODE 2016 was released two years ago, the amount of novel identified ncRNAs has been enlarged by the reduced cost of next-generation sequencing, which has produced an explosion of newly identified data. The third-generation sequencing revolution has also offered longer and more accurate annotations. Moreover, accumulating evidence confirmed by biological experiments has provided more comprehensive knowledge of lncRNA functions. The ncRNA data set was expanded by collecting newly identified ncRNAs from literature published over the past two years and integration of the latest versions of RefSeq and Ensembl. Additionally, pig was included in the database for the first time, bringing the total number of species to 17. The number of lncRNAs in NONCODEv5 increased from 527 336 to 548 640. NONCODEv5 also introduced three important new features: (i) human lncRNA-disease relationships and single nucleotide polymorphism-lncRNA-disease relationships were constructed; (ii) human exosome lncRNA expression profiles were displayed; (iii) the RNA secondary structures of NONCODE human transcripts were predicted. NONCODEv5 is also accessible through http://www.noncode.org/.


Subject(s)
Databases, Genetic , Molecular Sequence Annotation , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Animals , Disease/genetics , Exosomes/genetics , Exosomes/metabolism , Gene Expression Profiling , Humans , Mice , Nucleic Acid Conformation , Polymorphism, Single Nucleotide , RNA, Long Noncoding/chemistry
18.
Oncotarget ; 8(21): 34374-34386, 2017 May 23.
Article in English | MEDLINE | ID: mdl-28423735

ABSTRACT

Long non-coding RNAs are known to be involved in cancer progression, but their biological functions and prognostic values are still largely unexplored in diffuse large B-cell lymphoma. In this study, long non-coding RNAs expression was characterized in 1,403 samples including normal and diffuse large B-cell lymphoma by repurposing 7 microarray datasets. Compared with any stage of normal B cells, NONHSAG026900 expression was significantly decreased in tumor samples. And in germinal center B-cell subtype, the significantly higher expression of NONHSAG026900 indicated it was a favorable prognosis biomarker. Then the prognostic power of NONHSAG026900 was validated with another independent dataset and NONHSAG026900 improved the predictive power of International Prognostic Index as an independent factor. Moreover, functional prediction and validation demonstrated that NONHSAG026900 could inhibit cell cycle activity to restrain tumor proliferation. These findings identified NONHSAG026900 as a novel prognostic biomarker and offered a new therapeutic target for diffuse large B-cell lymphoma patients.


Subject(s)
Biomarkers, Tumor/genetics , Lymphoma, Large B-Cell, Diffuse/genetics , Lymphoma, Large B-Cell, Diffuse/pathology , RNA, Long Noncoding/genetics , Down-Regulation , Female , Gene Expression Regulation, Neoplastic , Humans , Male , Neoplasm Staging , Oligonucleotide Array Sequence Analysis , Prognosis , Survival Analysis
19.
Brief Bioinform ; 18(5): 789-797, 2017 09 01.
Article in English | MEDLINE | ID: mdl-27439532

ABSTRACT

RNA-seq technology offers the promise of rapid comprehensive discovery of long intervening noncoding RNAs (lincRNAs). Basic tools such as Tophat and Cufflinks have been widely used for RNA-seq assembly. However, advanced bioinformatics methodologies that allow in-depth analysis of lincRNAs are lacking. Here, we describe a computational protocol that is especially designed for the identification of novel lincRNAs and the prediction of the function. The protocol mainly includes two open-access tools, CNCI and ncFANs. CNCI allows users to distinguish noncoding from protein-coding transcripts and to retrieve novel lincRNAs. ncFANs integrates expression profiles of protein-coding and lincRNA genes to construct coexpression networks. Such networks are subsequently used to perform function predictions of unknown lincRNAs. This protocol will allow users to apply these procedures without the need of additional training. All the tools in current protocol are available http://www.bioinfo.org/np/.


Subject(s)
RNA, Long Noncoding/genetics , Computational Biology , Proteins
20.
Front Genet ; 8: 230, 2017.
Article in English | MEDLINE | ID: mdl-29387082

ABSTRACT

RNA editing is a post-transcriptional event that leads to transcriptome diversity and has been shown to play important roles in tumorigenesis. However, dynamical changes and the functional significance of editing events during different cancer stages have not yet been characterized systematically. In this paper, we describe a comprehensive study of the RNA editome of four samples from different cancer stages for the same patient based on analysis of both whole-genome and transcriptome sequencing data. We identified 35,225 and 33,784 RNA editing events for poly(A)+ and poly(A)- RNA sequencing data respectively in all four samples and show that 93 and 90% correspond to cancer stage-specific editing events. We also found that half of editing sites in 3' UTR of coding genes were microRNA targets and most of the sites in the coding regions could lead to non-synonymous amino acid changes. Functional analysis of genes which suffered damaging non-synonymous editing events in each cancer stage show the gradual expansion of cancer related pathways accompanied by an increasing malignant grade of the samples. Our study, for the first time to our knowledge, comprehensively profiled and compared the editomes across the different cancer stages and revealed the functional impacts of RNA editing events during cancer development and progression.

SELECTION OF CITATIONS
SEARCH DETAIL
...