Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Genome Res ; 33(4): 644-657, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37117035

RESUMO

Alternative polyadenylation (APA) enables a gene to generate multiple transcripts with different 3' ends, which is dynamic across different cell types or conditions. Many computational methods have been developed to characterize sample-specific APA using the corresponding RNA-seq data, but suffered from high error rate on both polyadenylation site (PAS) identification and quantification of PAS usage (PAU), and bias toward 3' untranslated regions. Here we developed a tool for APA identification and quantification (APAIQ) from RNA-seq data, which can accurately identify PAS and quantify PAU in a transcriptome-wide manner. Using 3' end-seq data as the benchmark, we showed that APAIQ outperforms current methods on PAS identification and PAU quantification, including DaPars2, Aptardi, mountainClimber, SANPolyA, and QAPA. Finally, applying APAIQ on 421 RNA-seq samples from liver cancer patients, we identified >540 tumor-associated APA events and experimentally validated two intronic polyadenylation candidates, demonstrating its capacity to unveil cancer-related APA with a large-scale RNA-seq data set.


Assuntos
Neoplasias , Transcriptoma , Humanos , Poliadenilação , RNA-Seq , Análise de Sequência de RNA/métodos , Neoplasias/genética , Regiões 3' não Traduzidas
2.
Genomics Proteomics Bioinformatics ; 20(5): 959-973, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36528241

RESUMO

The accurate annotation of transcription start sites (TSSs) and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts. To fulfill this, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner, and various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences. Most of these computational tools cast the problem as a binary classification task on a balanced dataset, thus resulting in drastic false positive predictions when applied on the genome scale. Here, we present DeeReCT-TSS, a deep learning-based method that is capable of identifying TSSs across the whole genome based on both DNA sequence and conventional RNA sequencing data. We show that by effectively incorporating these two sources of information, DeeReCT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types. Furthermore, we develop a meta-learning-based extension for simultaneous TSS annotations on 10 cell types, which enables the identification of cell type-specific TSSs. Finally, we demonstrate the high precision of DeeReCT-TSS on two independent datasets by correlating our predicted TSSs with experimentally defined TSS chromatin states. The source code for DeeReCT-TSS is available at https://github.com/JoshuaChou2018/DeeReCT-TSS_release and https://ngdc.cncb.ac.cn/biocode/tools/BT007316.


Assuntos
Genômica , RNA-Seq , Sequência de Bases , Sítio de Iniciação de Transcrição , Análise de Sequência de RNA/métodos
3.
Genomics Proteomics Bioinformatics ; 20(3): 483-495, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-33662629

RESUMO

Alternative polyadenylation (APA) is a crucial step in post-transcriptional regulation. Previous bioinformatic studies have mainly focused on the recognition of polyadenylation sites (PASs) in a given genomic sequence, which is a binary classification problem. Recently, computational methods for predicting the usage level of alternative PASs in the same gene have been proposed. However, all of them cast the problem as a non-quantitative pairwise comparison task and do not take the competition among multiple PASs into account. To address this, here we propose a deep learning architecture, Deep Regulatory Code and Tools for Alternative Polyadenylation (DeeReCT-APA), to quantitatively predict the usage of all alternative PASs of a given gene. To accommodate different genes with potentially different numbers of PASs, DeeReCT-APA treats the problem as a regression task with a variable-length target. Based on a convolutional neural network-long short-term memory (CNN-LSTM) architecture, DeeReCT-APA extracts sequence features with CNN layers, uses bidirectional LSTM to explicitly model the interactions among competing PASs, and outputs percentage scores representing the usage levels of all PASs of a gene. In addition to the fact that only our method can quantitatively predict the usage of all the PASs within a gene, we show that our method consistently outperforms other existing methods on three different tasks for which they are trained: pairwise comparison task, highest usage prediction task, and ranking task. Finally, we demonstrate that our method can be used to predict the effect of genetic variations on APA patterns and sheds light on future mechanistic understanding in APA regulation. Our code and data are available at https://github.com/lzx325/DeeReCT-APA-repo.


Assuntos
Aprendizado Profundo , Poliadenilação , Regulação da Expressão Gênica , Redes Neurais de Computação , Biologia Computacional/métodos , Regiões 3' não Traduzidas
4.
Am J Hum Genet ; 107(6): 1178-1185, 2020 12 03.
Artigo em Inglês | MEDLINE | ID: mdl-33242396

RESUMO

We have previously described a heart-, eye-, and brain-malformation syndrome caused by homozygous loss-of-function variants in SMG9, which encodes a critical component of the nonsense-mediated decay (NMD) machinery. Here, we describe four consanguineous families with four different likely deleterious homozygous variants in SMG8, encoding a binding partner of SMG9. The observed phenotype greatly resembles that linked to SMG9 and comprises severe global developmental delay, microcephaly, facial dysmorphism, and variable congenital heart and eye malformations. RNA-seq analysis revealed a general increase in mRNA expression levels with significant overrepresentation of core NMD substrates. We also identified increased phosphorylation of UPF1, a key SMG1-dependent step in NMD, which most likely represents the loss of SMG8--mediated inhibition of SMG1 kinase activity. Our data show that SMG8 and SMG9 deficiency results in overlapping developmental disorders that most likely converge mechanistically on impaired NMD.


Assuntos
Deficiências do Desenvolvimento/genética , Peptídeos e Proteínas de Sinalização Intracelular/genética , Degradação do RNAm Mediada por Códon sem Sentido , Adolescente , Encéfalo/anormalidades , Criança , Pré-Escolar , Consanguinidade , Deficiências do Desenvolvimento/metabolismo , Saúde da Família , Feminino , Deleção de Genes , Ligação Genética , Cardiopatias Congênitas/genética , Homozigoto , Humanos , Lactente , Masculino , Linhagem , Fenótipo , Fosforilação , RNA Helicases/metabolismo , RNA Mensageiro/metabolismo , RNA-Seq , Transativadores/metabolismo , Adulto Jovem
5.
Oncogene ; 39(28): 5152-5164, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32535615

RESUMO

Epithelial-mesenchymal transition (EMT) program, which facilitates tumor metastasis, stemness and therapy resistance, is a reversible biological process that is largely orchestrated at the epigenetic level under the regulation of different cell signaling pathways. EMT state is often heterogeneous within individual tumors, though the epigenetic drivers underlying such heterogeneity remain elusive. In colon cancer, hyperactivation of the Wnt/ß-catenin signaling not only drives tumor initiation, but also promotes metastasis in late stage by promoting EMT program. However, it is unknown whether the intratumorally heterogeneous Wnt activity could directly drive EMT heterogeneity, and, if so, what are the underlying epigenetic driver(s). Here, by analyzing a phenotypically and molecularly heterogeneous colon cancer cell line using single-cell RNA sequencing, we identified two distinct cell populations with positively correlated Wnt activity and EMT state. Integrative multi-omics analysis of these two cell populations revealed RUNX2 as a critical transcription factor epigenetically driving the EMT heterogeneity. Both in vitro and in vivo genetic perturbation assays validated the EMT-enhancing effect of RUNX2, which remodeled chromatin landscape and activated a panel of EMT-associated genes through binding to their promoters and/or potential enhancers. Finally, by exploring the clinical data, we showed that RUNX2 expression is positively correlated with metastasis development and poor survival of colon cancer patients, as well as patients afflicted with other types of cancer. Taken together, our work revealed RUNX2 as a new EMT-promoting epigenetic regulator in colon cancer, which may potentially serve as a prognostic marker for tumor metastasis.


Assuntos
Neoplasias do Colo/genética , Subunidade alfa 1 de Fator de Ligação ao Core/genética , Epigenômica/métodos , Transição Epitelial-Mesenquimal/genética , Perfilação da Expressão Gênica/métodos , Via de Sinalização Wnt/genética , beta Catenina/genética , Animais , Células CACO-2 , Linhagem Celular Tumoral , Neoplasias do Colo/patologia , Feminino , Regulação Neoplásica da Expressão Gênica , Células HCT116 , Células HEK293 , Células HeLa , Xenoenxertos , Humanos , Estimativa de Kaplan-Meier , Células MCF-7 , Camundongos
6.
Nucleic Acids Res ; 48(5): 2733-2748, 2020 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-32009146

RESUMO

Family with sequence similarity (FAM46) proteins are newly identified metazoan-specific poly(A) polymerases (PAPs). Although predicted as Gld-2-like eukaryotic non-canonical PAPs, the detailed architecture of FAM46 proteins is still unclear. Exact biological functions for most of FAM46 proteins also remain largely unknown. Here, we report the first crystal structure of a FAM46 protein, FAM46B. FAM46B is composed of a prominently larger N-terminal catalytic domain as compared to known eukaryotic PAPs, and a C-terminal helical domain. FAM46B resembles prokaryotic PAP/CCA-adding enzymes in overall folding as well as certain inter-domain connections, which distinguishes FAM46B from other eukaryotic non-canonical PAPs. Biochemical analysis reveals that FAM46B is an active PAP, and prefers adenosine-rich substrate RNAs. FAM46B is uniquely and highly expressed in human pre-implantation embryos and pluripotent stem cells, but sharply down-regulated following differentiation. FAM46B is localized to both cell nucleus and cytosol, and is indispensable for the viability of human embryonic stem cells. Knock-out of FAM46B is lethal. Knock-down of FAM46B induces apoptosis and restricts protein synthesis. The identification of the bacterial-like FAM46B, as a pluripotent stem cell-specific PAP involved in the maintenance of translational efficiency, provides important clues for further functional studies of this PAP in the early embryonic development of high eukaryotes.


Assuntos
Células-Tronco Embrionárias Humanas/metabolismo , Nucleotidiltransferases/metabolismo , Polinucleotídeo Adenililtransferase/metabolismo , Células Procarióticas/metabolismo , Animais , Biocatálise , Linhagem Celular , Sobrevivência Celular , Desenvolvimento Embrionário , Humanos , Modelos Moleculares , Nucleotidiltransferases/química , Nucleotidiltransferases/genética , Polinucleotídeo Adenililtransferase/química , Ligação Proteica , Domínios Proteicos , RNA/metabolismo , Especificidade por Substrato , Xenopus
7.
Nat Commun ; 10(1): 4941, 2019 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-31666519

RESUMO

Protein-RNA interaction plays important roles in post-transcriptional regulation. However, the task of predicting these interactions given a protein structure is difficult. Here we show that, by leveraging a deep learning model NucleicNet, attributes such as binding preference of RNA backbone constituents and different bases can be predicted from local physicochemical characteristics of protein structure surface. On a diverse set of challenging RNA-binding proteins, including Fem-3-binding-factor 2, Argonaute 2 and Ribonuclease III, NucleicNet can accurately recover interaction modes discovered by structural biology experiments. Furthermore, we show that, without seeing any in vitro or in vivo assay data, NucleicNet can still achieve consistency with experiments, including RNAcompete, Immunoprecipitation Assay, and siRNA Knockdown Benchmark. NucleicNet can thus serve to provide quantitative fitness of RNA sequences for given binding pockets or to predict potential binding pockets and binding RNAs for previously unknown RNA binding proteins.


Assuntos
Proteínas Argonautas/metabolismo , Aprendizado Profundo , RNA/metabolismo , Ribonuclease III/metabolismo , Adenina/metabolismo , Animais , Área Sob a Curva , Citosina/metabolismo , Técnicas de Silenciamento de Genes , Guanina/metabolismo , Humanos , Camundongos , Fosfatos/metabolismo , Ligação Proteica , RNA Interferente Pequeno , Proteínas de Ligação a RNA/metabolismo , Curva ROC , Ribose/metabolismo , Uracila/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA