Búsqueda | Portal Regional de la BVS

Accurate transcriptome-wide identification and quantification of alternative polyadenylation from RNA-seq data with APAIQ.

Long, Yongkang; Zhang, Bin; Tian, Shuye; Chan, Jia Jia; Zhou, Juexiao; Li, Zhongxiao; Li, Yisheng; An, Zheng; Liao, Xingyu; Wang, Yu; Sun, Shiwei; Xu, Ying; Tay, Yvonne; Chen, Wei; Gao, Xin.

Genome Res ; 33(4): 644-657, 2023 04.

Artículo en Inglés | MEDLINE | ID: mdl-37117035

RESUMEN

Alternative polyadenylation (APA) enables a gene to generate multiple transcripts with different 3' ends, which is dynamic across different cell types or conditions. Many computational methods have been developed to characterize sample-specific APA using the corresponding RNA-seq data, but suffered from high error rate on both polyadenylation site (PAS) identification and quantification of PAS usage (PAU), and bias toward 3' untranslated regions. Here we developed a tool for APA identification and quantification (APAIQ) from RNA-seq data, which can accurately identify PAS and quantify PAU in a transcriptome-wide manner. Using 3' end-seq data as the benchmark, we showed that APAIQ outperforms current methods on PAS identification and PAU quantification, including DaPars2, Aptardi, mountainClimber, SANPolyA, and QAPA. Finally, applying APAIQ on 421 RNA-seq samples from liver cancer patients, we identified >540 tumor-associated APA events and experimentally validated two intronic polyadenylation candidates, demonstrating its capacity to unveil cancer-related APA with a large-scale RNA-seq data set.

Asunto(s)

Neoplasias , Transcriptoma , Humanos , Poliadenilación , RNA-Seq , Análisis de Secuencia de ARN/métodos , Neoplasias/genética , Regiones no Traducidas 3'

Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS.

Zhou, Juexiao; Zhang, Bin; Li, Haoyang; Zhou, Longxi; Li, Zhongxiao; Long, Yongkang; Han, Wenkai; Wang, Mengran; Cui, Huanhuan; Li, Jingjing; Chen, Wei; Gao, Xin.

Genomics Proteomics Bioinformatics ; 20(5): 959-973, 2022 10.

Artículo en Inglés | MEDLINE | ID: mdl-36528241

RESUMEN

The accurate annotation of transcription start sites (TSSs) and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts. To fulfill this, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner, and various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences. Most of these computational tools cast the problem as a binary classification task on a balanced dataset, thus resulting in drastic false positive predictions when applied on the genome scale. Here, we present DeeReCT-TSS, a deep learning-based method that is capable of identifying TSSs across the whole genome based on both DNA sequence and conventional RNA sequencing data. We show that by effectively incorporating these two sources of information, DeeReCT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types. Furthermore, we develop a meta-learning-based extension for simultaneous TSS annotations on 10 cell types, which enables the identification of cell type-specific TSSs. Finally, we demonstrate the high precision of DeeReCT-TSS on two independent datasets by correlating our predicted TSSs with experimentally defined TSS chromatin states. The source code for DeeReCT-TSS is available at https://github.com/JoshuaChou2018/DeeReCT-TSS_release and https://ngdc.cncb.ac.cn/biocode/tools/BT007316.

Asunto(s)

Genómica , RNA-Seq , Secuencia de Bases , Sitio de Iniciación de la Transcripción , Análisis de Secuencia de ARN/métodos

DeeReCT-APA: Prediction of Alternative Polyadenylation Site Usage Through Deep Learning.

Li, Zhongxiao; Li, Yisheng; Zhang, Bin; Li, Yu; Long, Yongkang; Zhou, Juexiao; Zou, Xudong; Zhang, Min; Hu, Yuhui; Chen, Wei; Gao, Xin.

Genomics Proteomics Bioinformatics ; 20(3): 483-495, 2022 06.

Artículo en Inglés | MEDLINE | ID: mdl-33662629

RESUMEN

Alternative polyadenylation (APA) is a crucial step in post-transcriptional regulation. Previous bioinformatic studies have mainly focused on the recognition of polyadenylation sites (PASs) in a given genomic sequence, which is a binary classification problem. Recently, computational methods for predicting the usage level of alternative PASs in the same gene have been proposed. However, all of them cast the problem as a non-quantitative pairwise comparison task and do not take the competition among multiple PASs into account. To address this, here we propose a deep learning architecture, Deep Regulatory Code and Tools for Alternative Polyadenylation (DeeReCT-APA), to quantitatively predict the usage of all alternative PASs of a given gene. To accommodate different genes with potentially different numbers of PASs, DeeReCT-APA treats the problem as a regression task with a variable-length target. Based on a convolutional neural network-long short-term memory (CNN-LSTM) architecture, DeeReCT-APA extracts sequence features with CNN layers, uses bidirectional LSTM to explicitly model the interactions among competing PASs, and outputs percentage scores representing the usage levels of all PASs of a gene. In addition to the fact that only our method can quantitatively predict the usage of all the PASs within a gene, we show that our method consistently outperforms other existing methods on three different tasks for which they are trained: pairwise comparison task, highest usage prediction task, and ranking task. Finally, we demonstrate that our method can be used to predict the effect of genetic variations on APA patterns and sheds light on future mechanistic understanding in APA regulation. Our code and data are available at https://github.com/lzx325/DeeReCT-APA-repo.

Asunto(s)

Aprendizaje Profundo , Poliadenilación , Regulación de la Expresión Génica , Redes Neurales de la Computación , Biología Computacional/métodos , Regiones no Traducidas 3'

Recessive, Deleterious Variants in SMG8 Expand the Role of Nonsense-Mediated Decay in Developmental Disorders in Humans.

Alzahrani, Fatema; Kuwahara, Hiroyuki; Long, Yongkang; Al-Owain, Mohammed; Tohary, Mohamed; AlSayed, Moeenaldeen; Mahnashi, Mohammed; Fathi, Lana; Alnemer, Maha; Al-Hamed, Mohamed H; Lemire, Gabrielle; Boycott, Kym M; Hashem, Mais; Han, Wenkai; Al-Maawali, Almundher; Al Mahrizi, Feisal; Al-Thihli, Khalid; Gao, Xin; Alkuraya, Fowzan S.

Am J Hum Genet ; 107(6): 1178-1185, 2020 12 03.

Artículo en Inglés | MEDLINE | ID: mdl-33242396

RESUMEN

We have previously described a heart-, eye-, and brain-malformation syndrome caused by homozygous loss-of-function variants in SMG9, which encodes a critical component of the nonsense-mediated decay (NMD) machinery. Here, we describe four consanguineous families with four different likely deleterious homozygous variants in SMG8, encoding a binding partner of SMG9. The observed phenotype greatly resembles that linked to SMG9 and comprises severe global developmental delay, microcephaly, facial dysmorphism, and variable congenital heart and eye malformations. RNA-seq analysis revealed a general increase in mRNA expression levels with significant overrepresentation of core NMD substrates. We also identified increased phosphorylation of UPF1, a key SMG1-dependent step in NMD, which most likely represents the loss of SMG8--mediated inhibition of SMG1 kinase activity. Our data show that SMG8 and SMG9 deficiency results in overlapping developmental disorders that most likely converge mechanistically on impaired NMD.

Asunto(s)

Discapacidades del Desarrollo/genética , Péptidos y Proteínas de Señalización Intracelular/genética , Degradación de ARNm Mediada por Codón sin Sentido , Adolescente , Encéfalo/anomalías , Niño , Preescolar , Consanguinidad , Discapacidades del Desarrollo/metabolismo , Salud de la Familia , Femenino , Eliminación de Gen , Ligamiento Genético , Cardiopatías Congénitas/genética , Homocigoto , Humanos , Lactante , Masculino , Linaje , Fenotipo , Fosforilación , ARN Helicasas/metabolismo , ARN Mensajero/metabolismo , RNA-Seq , Transactivadores/metabolismo , Adulto Joven

Integrative multi-omics analysis of a colon cancer cell line with heterogeneous Wnt activity revealed RUNX2 as an epigenetic regulator of EMT.

Yi, Hongyang; Li, Guipeng; Long, Yongkang; Liang, Weizheng; Cui, Huanhuan; Zhang, Bin; Tan, Ying; Li, Yunfei; Shen, Luochen; Deng, Daqi; Tang, Yisen; Mao, Chenyu; Tian, Shuye; Cai, Yunting; Zhu, Qionghua; Hu, Yuhui; Chen, Wei; Fang, Liang.

Oncogene ; 39(28): 5152-5164, 2020 07.

Artículo en Inglés | MEDLINE | ID: mdl-32535615

RESUMEN

Epithelial-mesenchymal transition (EMT) program, which facilitates tumor metastasis, stemness and therapy resistance, is a reversible biological process that is largely orchestrated at the epigenetic level under the regulation of different cell signaling pathways. EMT state is often heterogeneous within individual tumors, though the epigenetic drivers underlying such heterogeneity remain elusive. In colon cancer, hyperactivation of the Wnt/ß-catenin signaling not only drives tumor initiation, but also promotes metastasis in late stage by promoting EMT program. However, it is unknown whether the intratumorally heterogeneous Wnt activity could directly drive EMT heterogeneity, and, if so, what are the underlying epigenetic driver(s). Here, by analyzing a phenotypically and molecularly heterogeneous colon cancer cell line using single-cell RNA sequencing, we identified two distinct cell populations with positively correlated Wnt activity and EMT state. Integrative multi-omics analysis of these two cell populations revealed RUNX2 as a critical transcription factor epigenetically driving the EMT heterogeneity. Both in vitro and in vivo genetic perturbation assays validated the EMT-enhancing effect of RUNX2, which remodeled chromatin landscape and activated a panel of EMT-associated genes through binding to their promoters and/or potential enhancers. Finally, by exploring the clinical data, we showed that RUNX2 expression is positively correlated with metastasis development and poor survival of colon cancer patients, as well as patients afflicted with other types of cancer. Taken together, our work revealed RUNX2 as a new EMT-promoting epigenetic regulator in colon cancer, which may potentially serve as a prognostic marker for tumor metastasis.

Asunto(s)

Neoplasias del Colon/genética , Subunidad alfa 1 del Factor de Unión al Sitio Principal/genética , Epigenómica/métodos , Transición Epitelial-Mesenquimal/genética , Perfilación de la Expresión Génica/métodos , Vía de Señalización Wnt/genética , beta Catenina/genética , Animales , Células CACO-2 , Línea Celular Tumoral , Neoplasias del Colon/patología , Femenino , Regulación Neoplásica de la Expresión Génica , Células HCT116 , Células HEK293 , Células HeLa , Xenoinjertos , Humanos , Estimación de Kaplan-Meier , Células MCF-7 , Ratones

FAM46B is a prokaryotic-like cytoplasmic poly(A) polymerase essential in human embryonic stem cells.

Hu, Jia-Li; Liang, He; Zhang, Hong; Yang, Ming-Zhu; Sun, Wei; Zhang, Peng; Luo, Li; Feng, Jian-Xiong; Bai, Huajun; Liu, Fang; Zhang, Tianpeng; Yang, Jin-Yu; Gao, Qingsong; Long, Yongkang; Ma, Xiao-Yan; Chen, Yang; Zhong, Qian; Yu, Bing; Liao, Shuang; Wang, Yongbo; Zhao, Yong; Zeng, Mu-Sheng; Cao, Nan; Wang, Jichang; Chen, Wei; Yang, Huang-Tian; Gao, Song.

Nucleic Acids Res ; 48(5): 2733-2748, 2020 03 18.

Artículo en Inglés | MEDLINE | ID: mdl-32009146

RESUMEN

Family with sequence similarity (FAM46) proteins are newly identified metazoan-specific poly(A) polymerases (PAPs). Although predicted as Gld-2-like eukaryotic non-canonical PAPs, the detailed architecture of FAM46 proteins is still unclear. Exact biological functions for most of FAM46 proteins also remain largely unknown. Here, we report the first crystal structure of a FAM46 protein, FAM46B. FAM46B is composed of a prominently larger N-terminal catalytic domain as compared to known eukaryotic PAPs, and a C-terminal helical domain. FAM46B resembles prokaryotic PAP/CCA-adding enzymes in overall folding as well as certain inter-domain connections, which distinguishes FAM46B from other eukaryotic non-canonical PAPs. Biochemical analysis reveals that FAM46B is an active PAP, and prefers adenosine-rich substrate RNAs. FAM46B is uniquely and highly expressed in human pre-implantation embryos and pluripotent stem cells, but sharply down-regulated following differentiation. FAM46B is localized to both cell nucleus and cytosol, and is indispensable for the viability of human embryonic stem cells. Knock-out of FAM46B is lethal. Knock-down of FAM46B induces apoptosis and restricts protein synthesis. The identification of the bacterial-like FAM46B, as a pluripotent stem cell-specific PAP involved in the maintenance of translational efficiency, provides important clues for further functional studies of this PAP in the early embryonic development of high eukaryotes.

Asunto(s)

Células Madre Embrionarias Humanas/metabolismo , Nucleotidiltransferasas/metabolismo , Polinucleotido Adenililtransferasa/metabolismo , Células Procariotas/metabolismo , Animales , Biocatálisis , Línea Celular , Supervivencia Celular , Desarrollo Embrionario , Humanos , Modelos Moleculares , Nucleotidiltransferasas/química , Nucleotidiltransferasas/genética , Polinucleotido Adenililtransferasa/química , Unión Proteica , Dominios Proteicos , ARN/metabolismo , Especificidad por Sustrato , Xenopus

A deep learning framework to predict binding preference of RNA constituents on protein surface.

Lam, Jordy Homing; Li, Yu; Zhu, Lizhe; Umarov, Ramzan; Jiang, Hanlun; Héliou, Amélie; Sheong, Fu Kit; Liu, Tianyun; Long, Yongkang; Li, Yunfei; Fang, Liang; Altman, Russ B; Chen, Wei; Huang, Xuhui; Gao, Xin.

Nat Commun ; 10(1): 4941, 2019 10 30.

Artículo en Inglés | MEDLINE | ID: mdl-31666519

RESUMEN

Protein-RNA interaction plays important roles in post-transcriptional regulation. However, the task of predicting these interactions given a protein structure is difficult. Here we show that, by leveraging a deep learning model NucleicNet, attributes such as binding preference of RNA backbone constituents and different bases can be predicted from local physicochemical characteristics of protein structure surface. On a diverse set of challenging RNA-binding proteins, including Fem-3-binding-factor 2, Argonaute 2 and Ribonuclease III, NucleicNet can accurately recover interaction modes discovered by structural biology experiments. Furthermore, we show that, without seeing any in vitro or in vivo assay data, NucleicNet can still achieve consistency with experiments, including RNAcompete, Immunoprecipitation Assay, and siRNA Knockdown Benchmark. NucleicNet can thus serve to provide quantitative fitness of RNA sequences for given binding pockets or to predict potential binding pockets and binding RNAs for previously unknown RNA binding proteins.

Asunto(s)

Proteínas Argonautas/metabolismo , Aprendizaje Profundo , ARN/metabolismo , Ribonucleasa III/metabolismo , Adenina/metabolismo , Animales , Área Bajo la Curva , Citosina/metabolismo , Técnicas de Silenciamiento del Gen , Guanina/metabolismo , Humanos , Ratones , Fosfatos/metabolismo , Unión Proteica , ARN Interferente Pequeño , Proteínas de Unión al ARN/metabolismo , Curva ROC , Ribosa/metabolismo , Uracilo/metabolismo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA