Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Methods ; 20(8): 1159-1169, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37443337

RESUMO

The detection of circular RNA molecules (circRNAs) is typically based on short-read RNA sequencing data processed using computational tools. Numerous such tools have been developed, but a systematic comparison with orthogonal validation is missing. Here, we set up a circRNA detection tool benchmarking study, in which 16 tools detected more than 315,000 unique circRNAs in three deeply sequenced human cell types. Next, 1,516 predicted circRNAs were validated using three orthogonal methods. Generally, tool-specific precision is high and similar (median of 98.8%, 96.3% and 95.5% for qPCR, RNase R and amplicon sequencing, respectively) whereas the sensitivity and number of predicted circRNAs (ranging from 1,372 to 58,032) are the most significant differentiators. Of note, precision values are lower when evaluating low-abundance circRNAs. We also show that the tools can be used complementarily to increase detection sensitivity. Finally, we offer recommendations for future circRNA detection and validation.


Assuntos
Benchmarking , RNA Circular , Humanos , RNA Circular/genética , RNA/genética , RNA/metabolismo , Análise de Sequência de RNA/métodos
2.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36259363

RESUMO

Robust strategies to identify patients at high risk for tumor metastasis, such as those frequently observed in intrahepatic cholangiocarcinoma (ICC), remain limited. While gene/protein expression profiling holds great potential as an approach to cancer diagnosis and prognosis, previously developed protocols using multiple diagnostic signatures for expression-based metastasis prediction have not been widely applied successfully because batch effects and different data types greatly decreased the predictive performance of gene/protein expression profile-based signatures in interlaboratory and data type dependent validation. To address this problem and assist in more precise diagnosis, we performed a genome-wide integrative proteome and transcriptome analysis and developed an ensemble machine learning-based integration algorithm for metastasis prediction (EMLI-Metastasis) and risk stratification (EMLI-Prognosis) in ICC. Based on massive proteome (216) and transcriptome (244) data sets, 132 feature (biomarker) genes were selected and used to train the EMLI-Metastasis algorithm. To accurately detect the metastasis of ICC patients, we developed a weighted ensemble machine learning method based on k-Top Scoring Pairs (k-TSP) method. This approach generates a metastasis classifier for each bootstrap aggregating training data set. Ten binary expression rank-based classifiers were generated for detection of metastasis separately. To further improve the accuracy of the method, the 10 binary metastasis classifiers were combined by weighted voting based on the score from the prediction results of each classifier. The prediction accuracy of the EMLI-Metastasis algorithm achieved 97.1% and 85.0% in proteome and transcriptome datasets, respectively. Among the 132 feature genes, 21 gene-pair signatures were developed to establish a metastasis-related prognosis risk-stratification model in ICC (EMLI-Prognosis). Based on EMLI-Prognosis algorithm, patients in the high-risk group had significantly dismal overall survival relative to the low-risk group in the clinical cohort (P-value < 0.05). Taken together, the EMLI-ICC algorithm provides a powerful and robust means for accurate metastasis prediction and risk stratification across proteome and transcriptome data types that is superior to currently used clinicopathological features in patients with ICC. Our developed algorithm could have profound implications not just in improved clinical care in cancer metastasis risk prediction, but also more broadly in machine-learning-based multi-cohort diagnosis method development. To make the EMLI-ICC algorithm easily accessible for clinical application, we established a web-based server for metastasis risk prediction (http://ibi.zju.edu.cn/EMLI/).


Assuntos
Neoplasias dos Ductos Biliares , Colangiocarcinoma , Humanos , Proteoma , Algoritmos , Colangiocarcinoma/genética , Aprendizado de Máquina , Neoplasias dos Ductos Biliares/genética , Ductos Biliares Intra-Hepáticos/patologia , Medição de Risco
3.
Database (Oxford) ; 20222022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35788653

RESUMO

Osteoarthritis (OA) is the most common form of arthritis in the adult population and is a leading cause of disability. OA-related genetic loci may play an important role in clinical diagnosis and disease progression. With the rapid development of diverse technologies and omics methods, many OA-related public data sets have been accumulated. Here, we retrieved a diverse set of omics experimental results from 159 publications, including genome-wide association study, differentially expressed genes and differential methylation regions, and 2405 classified OA-related gene markers. Meanwhile, based on recent single-cell RNA-seq data from different joints, 5459 cell-type gene markers of joints were collected. The information has been integrated into an online database named OAomics and molecular biomarkers (OAOB). The database (http://ibi.zju.edu.cn/oaobdb/) provides a web server for OA marker genes, omics features and so on. To our knowledge, this is the first database of molecular biomarkers for OA.


Assuntos
Estudo de Associação Genômica Ampla , Osteoartrite , Bases de Dados Factuais , Marcadores Genéticos , Humanos , Osteoartrite/genética
4.
Plant Commun ; 3(4): 100343, 2022 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-35637632

RESUMO

Circular RNA (circRNA) is a special type of non-coding RNA that participates in diverse biological processes in both animals and plants. Five years ago, we developed a comprehensive plant circRNA database (PlantcircBase), which has attracted much attention from the plant circRNA community. Here, we report an updated PlantcircBase (v.7.0), which contains 171,118 circRNAs from 21 plant species. Over 31,000 of the circRNAs have full-length sequences constructed based on analysis of 749 bulk RNA sequencing (RNA-seq) datasets downloaded from the public domain and Nanopore long-read sequencing results of rice RNAs newly generated in this study. A plant multiple conservation score (PMCS), based on the conservation of both sequence and expression profiles, was calculated for each circRNA to quantify and compare the conservation of all circRNAs. A new parameter, plant circRNA confidence level (PCCL), is introduced to measure the identity reliability of each circRNA based on experimental validation results and the number of references that support the circRNA. All this information and other details of circRNAs can be browsed, searched, and downloaded from PlantcircBase 7.0, which also provides online bioinformatics tools for visualization and sequence alignment. PlantcircBase 7.0 is publicly and freely accessible at http://ibi.zju.edu.cn/plantcircbase/.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA Circular , Plantas/genética , Plantas/metabolismo , RNA Circular/genética , RNA de Plantas/genética , RNA de Plantas/metabolismo , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos
5.
Mitochondrial DNA B Resour ; 7(1): 112-114, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34993330

RESUMO

Secale strictum subsp. kuprijanovii is a perennial, hermaphrodite wild rye species and a progenitor of the modern cultivated rye, Secale cereale. With high adaptive capacity in stress conditions, it is valuable for enriching the germplasm resources of rye. Therefore, to elucidate its genetic and phylogenetic relationship is of great importance. We hereby sequenced, assembled and presented for the first time the complete chloroplast genome of this less studied species. The whole genome is 137,079 bp in size, including a large single copy region of 81,099 bp, a small single copy region of 12,820 bp and two separated inverted repeat regions of 43,160 bp. A total of 109 unique genes were annotated, including 67 protein-coding genes, 38 tRNA genes and 4 rRNA genes. Phylogenetic analysis showed that Secale strictum subsp. kuprijanovii clustered most closely with Secale cereal. A remarkably close evolutionary relationship of S.strictum subsp. kuprijanovii with various wheat varieties may indicate its usage as a genetic resource for the breeding of both the cultivated rye and wheat.

6.
New Phytol ; 233(1): 515-525, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34643280

RESUMO

Circular RNA (circRNA) is a kind of new regulatory RNA with diverse biological functions. Numerous circRNAs have been identified in many plant species; however, evolution of plant circRNAs remains largely unknown. In this study, we assembled full-length sequences of 6519 rice (Oryza sativa) circRNAs and analyzed their conservation in another 46 plant species based on comparison of sequences and expression patterns. We found that, at the genomic level, 8.7% of the 6519 circRNAs were conserved in dicotyledonous plants and 49.1% in Oryza genus. Meanwhile, 57.8% of parental protein-coding genes of the rice circRNAs originated recently after divergence of monocotyledonous plants, implying recent origin of the majority of rice circRNAs, a conclusion further supported by the results based on analysis of 4663 full-length circRNAs in Arabidopsis thaliana. Accordingly, we proposed three models to address the origination of different types of circRNAs. Taken together, the results obtained in this study provide new insights for the evolutionary dynamics of plant circRNAs and candidate circRNAs for further functional exploration.


Assuntos
Oryza , RNA Circular , Oryza/genética , Plantas/genética , RNA/genética , Análise de Sequência de RNA
7.
Curr Issues Mol Biol ; 43(3): 1685-1697, 2021 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-34698115

RESUMO

Single-cell RNA (scRNA) profiling or scRNA-sequencing (scRNA-seq) makes it possible to parallelly investigate diverse molecular features of multiple types of cells in a given plant tissue and discover cell developmental processes. In this study, we evaluated the effects of sample size (i.e., cell number) on the outcome of single-cell transcriptome analysis by sampling different numbers of cells from a pool of ~57,000 Arabidopsis thaliana root cells integrated from five published studies. Our results indicated that the most significant principal components could be achieved when 20,000-30,000 cells were sampled, a relatively high reliability of cell clustering could be achieved by using ~20,000 cells with little further improvement by using more cells, 96% of the differentially expressed genes could be successfully identified with no more than 20,000 cells, and a relatively stable pseudotime could be estimated in the subsample with 5000 cells. Finally, our results provide a general guide for optimizing sample size to be used in plant scRNA-seq studies.


Assuntos
Perfilação da Expressão Gênica , RNA de Plantas , Análise de Célula Única , Transcriptoma , Arabidopsis/genética , Contagem de Células , Análise por Conglomerados , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Especificidade de Órgãos/genética , Plantas/genética , Análise de Sequência de RNA , Análise de Célula Única/métodos , Análise de Célula Única/normas
8.
PLoS One ; 16(9): e0257878, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34587184

RESUMO

Extracellular microRNAs (miRNAs) have been proposed to function in cross-kingdom gene regulation. Among these, plant-derived miRNAs of dietary origin have been reported to survive the harsh conditions of the human digestive system, enter the circulatory system, and regulate gene expression and metabolic function. However, definitive evidence supporting the presence of plant-derived miRNAs of dietary origin in mammals has been difficult to obtain due to limited sample sizes. We have developed a bioinformatics pipeline (ePmiRNA_finder) that provides strident miRNA classification and applied it to analyze 421 small RNA sequencing data sets from 10 types of human body fluids and tissues and comparative samples from carnivores and herbivores. A total of 35 miRNAs were identified that map to plants typically found in the human diet and these miRNAs were found in at least one human blood sample and their abundance was significantly different when compared to samples from human microbiome or cow. The plant-derived miRNA profiles were body fluid/tissue-specific and highly abundant in the brain and the breast milk samples, indicating selective absorption and/or the ability to be transported across tissue/organ barriers. Our data provide conclusive evidence for the presence of plant-derived miRNAs as a consequence of dietary intake and their cross-kingdom regulatory function within human circulating system.


Assuntos
Biologia Computacional/métodos , MicroRNAs/genética , Plantas/genética , Análise de Sequência de RNA/métodos , Ração Animal/análise , Animais , Química Encefálica , Carnívoros/genética , Dieta , Feminino , Herbivoria/genética , Humanos , Leite Humano/química , Especificidade de Órgãos , RNA de Plantas/genética , Tamanho da Amostra
9.
Sci Rep ; 11(1): 16715, 2021 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-34408184

RESUMO

Exposure to cigarette smoke (CS) results in injury to the epithelial cells of the human respiratory tract and has been implicated as a causative factor in the development of chronic obstructive pulmonary disease and lung cancers. The application of omics-scale methodologies has improved the capacity to understand cellular signaling processes underlying response to CS exposure. We report here the development of an algorithm based on quantitative assessment of transcriptomic profiles and signaling pathway perturbation analysis (SPPA) of human bronchial epithelial cells (HBEC) exposed to the toxic components present in CS. HBEC were exposed to CS of different compositions and for different durations using an ISO3308 smoking regime and the impact of exposure was monitored in 2263 signaling pathways in the cell to generate a total effect score that reflects the quantitative degree of impact of external stimuli on the cells. These findings support the conclusion that the SPPA algorithm provides an objective, systematic, sensitive means to evaluate the biological impact of exposures to CS of different compositions making a powerful comparative tool for commercial product evaluation and potentially for other known or potentially toxic environmental smoke substances.


Assuntos
Células Epiteliais/metabolismo , Pulmão/metabolismo , Transdução de Sinais , Fumar/metabolismo , Linhagem Celular , Humanos
10.
Bioinformatics ; 37(22): 4115-4122, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34048541

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) has enabled the characterization of different cell types in many tissues and tumor samples. Cell type identification is essential for single-cell RNA profiling, currently transforming the life sciences. Often, this is achieved by searching for combinations of genes that have previously been implicated as being cell-type specific, an approach that is not quantitative and does not explicitly take advantage of other scRNA-seq studies. Batch effects and different data platforms greatly decrease the predictive performance in inter-laboratory and different data type validation. RESULTS: Here, we present a new ensemble learning method named as 'scDetect' that combines gene expression rank-based analysis and a majority vote ensemble machine-learning probability-based prediction method capable of highly accurate classification of cells based on scRNA-seq data by different sequencing platforms. Because of tumor heterogeneity, in order to accurately predict tumor cells in the single-cell RNA-seq data, we have also incorporated cell copy number variation consensus clustering and epithelial score in the classification. We applied scDetect to scRNA-seq data from pancreatic tissue, mononuclear cells and tumor biopsies cells and show that scDetect classified individual cells with high accuracy and better than other publicly available tools. AVAILABILITY AND IMPLEMENTATION: scDetect is an open source software. Source code and test data is freely available from Github (https://github.com/IVDgenomicslab/scDetect/) and Zenodo (https://zenodo.org/record/4764132#.YKCOlrH5AYN). The examples and tutorial page is at https://ivdgenomicslab.github.io/scDetect-Introduction/. And scDetect will be available from Bioconductor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Variações do Número de Cópias de DNA , Neoplasias , Humanos , Análise de Sequência de RNA , Análise de Célula Única , Algoritmos
12.
Brief Bioinform ; 22(2): 2106-2118, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32266390

RESUMO

Gene expression profiling holds great potential as a new approach to histological diagnosis and precision medicine of cancers of unknown primary (CUP). Batch effects and different data types greatly decrease the predictive performance of biomarker-based algorithms, and few methods have been widely applied to identify tissue origin of CUP up to now. To address this problem and assist in more precise diagnosis, we have developed a gene expression rank-based majority vote algorithm for tissue origin diagnosis of CUP (TOD-CUP) of most common cancer types. Based on massive tissue-specific RNA-seq data sets (10 553) found in The Cancer Genome Atlas (TCGA), 538 feature genes (biomarkers) were selected based on their gene expression ranks and used to predict tissue types. The top scoring pairs (TSPs) classifier of the tumor type was optimized by the TCGA training samples. To test the prediction accuracy of our TOD-CUP algorithm, we analyzed (1) two microarray data sets (1029 Agilent and 2277 Affymetrix/Illumina chips) and found 91% and 94% prediction accuracy, respectively, (2) RNA-seq data from five cancer types derived from 141 public metastatic cancer tumor samples and achieved 94% accuracy and (3) a total of 25 clinical cancer samples (including 14 metastatic cancer samples) were able to classify 24/25 samples correctly (96.0% accuracy). Taken together, the TOD-CUP algorithm provides a powerful and robust means to accurately identify the tissue origin of 24 cancer types across different data platforms. To make the TOD-CUP algorithm easily accessible for clinical application, we established a Web-based server for tumor tissue origin diagnosis (http://ibi. zju.edu.cn/todcup/).


Assuntos
Expressão Gênica , Neoplasias Primárias Desconhecidas/genética , Algoritmos , Biomarcadores Tumorais/metabolismo , Humanos , Neoplasias Primárias Desconhecidas/patologia , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de RNA/métodos
13.
Brief Bioinform ; 21(1): 135-143, 2020 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-30445438

RESUMO

Circular RNA (circRNA) is a kind of covalently closed single-stranded RNA molecules that have been proved to play important roles in transcriptional regulation of genes in diverse species. With the rapid development of bioinformatics tools, a huge number (95143) of circRNAs have been identified from different plant species, providing an opportunity for uncovering the overall characteristics of plant circRNAs. Here, based on publicly available circRNAs, we comprehensively analyzed characteristics of plant circRNAs with the help of various bioinformatics tools as well as in-house scripts and workflows, including the percentage of coding genes generating circRNAs, the frequency of alternative splicing events of circRNAs, the non-canonical splicing signals of circRNAs and the networks involving circRNAs, miRNAs and mRNAs. All this information has been integrated into an upgraded online database, PlantcircBase 3.0 (http://ibi.zju.edu.cn/plantcircbase/). In this database, we provided browse, search and visualization tools as well as a web-based blast tool, BLASTcirc, for prediction of circRNAs from query sequences based on searching against plant genomes and transcriptomes.

14.
Database (Oxford) ; 20192019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31560050

RESUMO

Rice (Oryza sativa L.) is one of the most important crops worldwide. Its relatives, including phylogenetically related species of rice and paddy weeds with a similar ecological niche, can provide crucial genetic resources (such as resistance to biotic and abiotic stresses and high photosynthetic efficiency) for rice research. Although many rice genomic databases have been constructed, a database providing large-scale curated genomic data from rice relatives and offering specific gene resources is still lacking. Here, we present RiceRelativesGD, a user-friendly genomic database of rice relatives. RiceRelativesGD integrates large-scale genomic resources from 2 cultivated rice and 11 rice relatives, including 208 321 specific genes and 13 643 genes related to photosynthesis and responsive to external stimuli. Diverse bioinformatics tools are embedded in the database, which allow users to search, visualize and download the information of interest. To our knowledge, this is the first genomic database providing a centralized genetic resource of rice relatives. RiceRelativesGD will serve as a significant and comprehensive knowledgebase for the rice community.


Assuntos
Curadoria de Dados , Bases de Dados Genéticas , Genoma de Planta , Oryza , Oryza/genética , Oryza/metabolismo
15.
Front Genet ; 9: 34, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29479369

RESUMO

Circular RNAs (circRNAs) have been reported that can be used as biomarkers for colorectal cancers (CRC) and other types of tumors. However, a limited number of studies have been performed investigating the potential role of circRNAs in tumor metastasis. Here, we examined the circRNAs in two CRC cell lines (a primary tumor cell SW480 and its metastasis cell SW620), and found a large set of circRNA (2,919 ncDECs) with significantly differential expression patterns relative to normal cells (NCM460). In addition, we uncovered a set of 623 pmDECs that differ between the primary CRC cells and its metastasis cells. Both differentially expressed circRNA (DEC) sets contain many previously unknown putative CRC-related circRNAs, thereby providing many new circRNAs as candidate biomarkers for CRC development and metastasis. These studies are the first large-scale identification of metastasis-related circRNAs for CRC and provide valuable candidate biomarkers for diagnostic and a starting point for additional investigations of CRC metastasis.

16.
Nat Commun ; 8(1): 1031, 2017 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-29044108

RESUMO

Barnyardgrass (Echinochloa crus-galli) is a pernicious weed in agricultural fields worldwide. The molecular mechanisms underlying its success in the absence of human intervention are presently unknown. Here we report a draft genome sequence of the hexaploid species E. crus-galli, i.e., a 1.27 Gb assembly representing 90.7% of the predicted genome size. An extremely large repertoire of genes encoding cytochrome P450 monooxygenases and glutathione S-transferases associated with detoxification are found. Two gene clusters involved in the biosynthesis of an allelochemical 2,4-dihydroxy-7-methoxy-1,4-benzoxazin-3-one (DIMBOA) and a phytoalexin momilactone A are found in the E. crus-galli genome, respectively. The allelochemical DIMBOA gene cluster is activated in response to co-cultivation with rice, while the phytoalexin momilactone A gene cluster specifically to infection by pathogenic Pyricularia oryzae. Our results provide a new understanding of the molecular mechanisms underlying the extreme adaptation of the weed.


Assuntos
Echinochloa/fisiologia , Genoma de Planta , Plantas Daninhas/fisiologia , Adaptação Fisiológica , Echinochloa/genética , Echinochloa/crescimento & desenvolvimento , Tamanho do Genoma , Oryza/crescimento & desenvolvimento , Feromônios/metabolismo , Proteínas de Plantas/genética , Plantas Daninhas/genética , Plantas Daninhas/crescimento & desenvolvimento
18.
RNA Biol ; 14(8): 1055-1063, 2017 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-27739910

RESUMO

Circular RNAs (circRNAs) have been identified in diverse eukaryotic species and are characterized by RNA backsplicing events. Current available methods for circRNA identification are able to determine the start and end locations of circRNAs in the genome but not their full-length sequences. In this study, we developed a method to assemble the full-length sequences of circRNAs using the backsplicing RNA-Seq reads and their corresponding paired-end reads. By applying the method to an rRNA-depleted/RNase R-treated RNA-Seq dataset, we for the first time identified full-length sequences of nearly 3,000 circRNAs in rice. We further showed that alternative circularization of circRNA is a common feature in rice and, surprisingly, found that the junction sites of a large number of rice circRNAs are flanked by diverse non-GT/AG splicing signals while most human exonic circRNAs are flanked by canonical GT/AG splicing signals. Our study provides a method for genome-wide identification of full-length circRNAs and expands our understanding of splicing signals of circRNAs.


Assuntos
Processamento Alternativo , Genoma de Planta , Oryza/genética , Sítios de Splice de RNA , RNA de Plantas/genética , RNA/genética , Sequência de Bases , Conjuntos de Dados como Assunto , Exorribonucleases/química , Humanos , Oryza/metabolismo , RNA/química , RNA/metabolismo , Estabilidade de RNA , RNA Circular , RNA de Plantas/química , RNA de Plantas/metabolismo , RNA Ribossômico/química , Análise de Sequência de RNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...