RESUMEN
RNA decay is an important regulatory mechanism for gene expression at the posttranscriptional level. Although the main pathways and major enzymes that facilitate this process are well defined, global analysis of RNA turnover remains under-investigated. Recent advances in the application of next-generation sequencing technology enable its use in order to examine various RNA decay patterns at the genome-wide scale. In this study, we investigated human RNA decay patterns using parallel analysis of RNA end-sequencing (PARE-seq) data from XRN1-knockdown HeLa cell lines, followed by a comparison of steady state and degraded mRNA levels from RNA-seq and PARE-seq data, respectively. The results revealed 1103 and 1347 transcripts classified as stable and unstable candidates, respectively. Of the unstable candidates, we found that a subset of the replication-dependent histone transcripts was polyadenylated and rapidly degraded. Additionally, we identified 380 endonucleolytically cleaved candidates by analyzing the most abundant PARE sequence on a transcript. Of these, 41.4% of genes were classified as unstable genes, which implied that their endonucleolytic cleavage might affect their mRNA stability. Furthermore, we identified 1877 decapped candidates, including HSP90B1 and SWI5, having the most abundant PARE sequences at the 5'-end positions of the transcripts. These results provide a useful resource for further analysis of RNA decay patterns in human cells.
Asunto(s)
Regulación de la Expresión Génica/genética , Estabilidad del ARN/fisiología , Secuencia de Bases/genética , Bases de Datos Genéticas , Genoma/genética , Células HeLa , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Histonas/metabolismo , Humanos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Secuenciación Completa del Genoma/métodosRESUMEN
BACKGROUND: The identification of gene-phenotype relationships is important in medical genetics as it serves as a basis for precision medicine. However, most of the gene-phenotype relationship data are buried in the biomedical literature in textual form. OBJECTIVE: We propose RelCurator, a curation system that extracts sentences including both gene and phenotype entities related to specific disease categories from PubMed articles, provides rich additional information such as entity taggings, and predictions of gene-phenotype relationships. METHODS: We targeted neurodegenerative disorders and developed a deep learning model using Bidirectional Gated Recurrent Unit (BiGRU) networks and BioWordVec word embeddings for predicting gene-phenotype relationships from biomedical texts. The prediction model is trained with more than 130,000 labeled PubMed sentences including gene and phenotype entities, which are related to or unrelated to neurodegenerative disorders. RESULTS: We compared the performance of our deep learning model with those of Bidirectional Encoder Representations from Transformers (BERT), Support Vector Machine (SVM), and simple Recurrent Neural Network (simple RNN) models. Our model performed better with an F1-score of 0.96. Furthermore, the evaluation done using a few curation cases in the real scenario showed the effectiveness of our work. Therefore, we conclude that RelCurator can identify not only new causative genes, but also new genes associated with neurodegenerative disorders' phenotype. CONCLUSION: RelCurator is a user-friendly method for accessing deep learning-based supporting information and a concise web interface to assist curators while browsing the PubMed articles. Our curation process represents an important and broadly applicable improvement to the state of the art for the curation of gene-phenotype relationships.
Asunto(s)
Minería de Datos , Enfermedades Neurodegenerativas , Humanos , Minería de Datos/métodos , Redes Neurales de la Computación , Enfermedades Neurodegenerativas/genéticaRESUMEN
Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform whole-genome analyses on their objects of interest. However, these analytical tools are generally complex and use diverse algorithms, parameter setting methods, and input formats; thus, it remains difficult for individual researchers to select, utilize, and combine these tools to obtain their final results. To resolve these issues, we have developed a genome analysis pipeline (GAAP) for semiautomated, iterative, and high-throughput analysis of whole-genome data. This pipeline is designed to perform read correction, de novo genome (transcriptome) assembly, gene prediction, and functional annotation using a range of proven tools and databases. We aim to assist non-IT researchers by describing each stage of analysis in detail and discussing current approaches. We also provide practical advice on how to access and use the bioinformatics tools and databases and how to implement the provided suggestions. Whole-genome analysis of Toxocara canis is used as case study to show intermediate results at each stage, demonstrating the practicality of the proposed method.