RESUMO
Drosophila Dicer-1 produces microRNAs (miRNAs) from pre-miRNA, whereas Dicer-2 generates small interfering RNAs (siRNAs) from long dsRNA. Alternative splicing of the loquacious (loqs) mRNA generates three distinct Dicer partner proteins. To understand the function of each, we constructed flies expressing Loqs-PA, Loqs-PB, or Loqs-PD. Loqs-PD promotes both endo- and exo-siRNA production by Dicer-2. Loqs-PA or Loqs-PB is required for viability, but the proteins are not fully redundant: a specific subset of miRNAs requires Loqs-PB. Surprisingly, Loqs-PB tunes where Dicer-1 cleaves pre-miR-307a, generating a longer miRNA isoform with a distinct seed sequence and target specificity. The longer form of miR-307a represses glycerol kinase and taranis mRNA expression. The mammalian Dicer-partner TRBP, a Loqs-PB homolog, similarly tunes where Dicer cleaves pre-miR-132. Thus, Dicer-binding partner proteins change the choice of cleavage site by Dicer, producing miRNAs with target specificities different from those made by Dicer alone or Dicer bound to alternative protein partners.
Assuntos
RNA Helicases DEAD-box/metabolismo , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/metabolismo , RNA Helicases/metabolismo , Proteínas de Ligação a RNA/metabolismo , Ribonuclease III/metabolismo , Animais , Sequência de Bases , Drosophila melanogaster/genética , Feminino , Humanos , Masculino , Camundongos , MicroRNAs/metabolismo , Dados de Sequência MolecularRESUMO
Numerous chromatin regulators are required for embryonic stem (ES) cell self-renewal and pluripotency, but few have been studied in detail. Here, we examine the roles of several chromatin regulators whose loss affects the pluripotent state of ES cells. We find that Mbd3 and Brg1 antagonistically regulate a common set of genes by regulating promoter nucleosome occupancy. Furthermore, both Mbd3 and Brg1 play key roles in the biology of 5-hydroxymethylcytosine (5hmC): Mbd3 colocalizes with Tet1 and 5hmC in vivo, Mbd3 knockdown preferentially affects expression of 5hmC-marked genes, Mbd3 localization is Tet1-dependent, and Mbd3 preferentially binds to 5hmC relative to 5-methylcytosine in vitro. Finally, both Mbd3 and Brg1 are themselves required for normal levels of 5hmC in vivo. Together, our results identify an effector for 5hmC, and reveal that control of gene expression by antagonistic chromatin regulators is a surprisingly common regulatory strategy in ES cells.
Assuntos
Citosina/análogos & derivados , Proteínas de Ligação a DNA/metabolismo , Células-Tronco Embrionárias/metabolismo , Complexo Mi-2 de Remodelação de Nucleossomo e Desacetilase/metabolismo , Fatores de Transcrição/metabolismo , 5-Metilcitosina/análogos & derivados , Animais , Montagem e Desmontagem da Cromatina , Citosina/metabolismo , DNA Helicases/metabolismo , Proteínas de Ligação a DNA/genética , Técnicas de Silenciamento de Genes , Humanos , Camundongos , Proteínas Nucleares/metabolismo , Proteínas Proto-Oncogênicas/genética , Proteínas Proto-Oncogênicas/metabolismo , RNA Polimerase II/metabolismoRESUMO
MOTIVATION: The Full-text index in Minute space (FM-index) is a memory-efficient data structure widely used in bioinformatics for solving the fundamental pattern-matching task of searching for short patterns within a long reference. With the demand for short query patterns, the k-ordered concept has been proposed for FM-indexes. However, few construction algorithms in the state of the art fully exploit this idea to achieve significant speedups in the pan-genome era. RESULTS: We introduce the k-ordered Induced Suffix Sorting (kISS) for efficient construction and utilization of k-ordered FM-indexes. We present an algorithmic workflow for building k-ordered suffix arrays, incorporating two novel strategies to improve time and memory efficiency. We also demonstrate the compatibility of integrating k-ordered FM-indexes with locate operations in FMtree. Experiments show that kISS can improve the construction time, and the generated k-ordered suffix array can also be applied to FMtree without any additional in computation or memory usage. AVAILABILITY: https://github.com/jhhung/kISS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
The rapid proliferation of new psychoactive substances (NPS) poses significant challenges to conventional mass-spectrometry-based identification methods due to the absence of reference spectra for these emerging substances. This paper introduces PS2MS, an AI-powered predictive system designed specifically to address the limitations of identifying the emergence of unidentified novel illicit drugs. PS2MS builds a synthetic NPS database by enumerating feasible derivatives of known substances and uses deep learning to generate mass spectra and chemical fingerprints. When the mass spectrum of an analyte does not match any known reference, PS2MS simultaneously examines the chemical fingerprint and mass spectrum against the putative NPS database using integrated metrics to deduce possible identities. Experimental results affirm the effectiveness of PS2MS in identifying cathinone derivatives within real evidence specimens, signifying its potential for practical use in identifying emerging drugs of abuse for researchers and forensic experts.
Assuntos
Aprendizado Profundo , Drogas Ilícitas , Cromatografia Líquida/métodos , Psicotrópicos/análise , Espectrometria de Massas/métodos , Drogas Ilícitas/análise , Detecção do Abuso de Substâncias/métodosRESUMO
Uridylation of RNA species represents an emerging theme in post-transcriptional gene regulation. In the microRNA pathway, such modifications regulate small RNA biogenesis and stability in plants, worms, and mammals. Here, we report Tailor, an uridylyltransferase that is required for the majority of 3' end modifications of microRNAs in Drosophila and predominantly targets precursor hairpins. Uridylation modulates the characteristic two-nucleotide 3' overhang of microRNA hairpins, which regulates processing by Dicer-1 and destabilizes RNA hairpins. Tailor preferentially uridylates mirtron hairpins, thereby impeding the production of non-canonical microRNAs. Mirtron selectivity is explained by primary sequence specificity of Tailor, selecting substrates ending with a 3' guanosine. In contrast to mirtrons, conserved Drosophila precursor microRNAs are significantly depleted in 3' guanosine, thereby escaping regulatory uridylation. Our data support the hypothesis that evolutionary adaptation to Tailor-directed uridylation shapes the nucleotide composition of precursor microRNA 3' ends. Hence, hairpin uridylation may serve as a barrier for the de novo creation of microRNAs in Drosophila.
Assuntos
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/metabolismo , MicroRNAs/química , MicroRNAs/metabolismo , RNA Nucleotidiltransferases/metabolismo , Sequência de Aminoácidos , Animais , Sequência de Bases , Linhagem Celular , Proteínas de Drosophila/antagonistas & inibidores , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Drosophila melanogaster/fisiologia , Feminino , Fertilidade/genética , Fertilidade/fisiologia , Técnicas de Silenciamento de Genes , Genes de Insetos , Masculino , MicroRNAs/genética , Dados de Sequência Molecular , Mutação , Conformação de Ácido Nucleico , RNA Nucleotidiltransferases/antagonistas & inibidores , RNA Nucleotidiltransferases/genética , Processamento Pós-Transcricional do RNA , Estabilidade de RNA , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA Interferente Pequeno/genética , Especificidade por SubstratoRESUMO
MOTIVATION: Cross-sample comparisons or large-scale meta-analyses based on the next generation sequencing (NGS) involve replicable and universal data preprocessing, including removing adapter fragments in contaminated reads (i.e. adapter trimming). While modern adapter trimmers require users to provide candidate adapter sequences for each sample, which are sometimes unavailable or falsely documented in the repositories (such as GEO or SRA), large-scale meta-analyses are therefore jeopardized by suboptimal adapter trimming. RESULTS: Here we introduce a set of fast and accurate adapter detection and trimming algorithms that entail no a priori adapter sequences. These algorithms were implemented in modern C++ with SIMD and multithreading to accelerate its speed. Our experiments and benchmarks show that the implementation (i.e. EARRINGS), without being given any hint of adapter sequences, can reach comparable accuracy and higher throughput than that of existing adapter trimmers. EARRINGS is particularly useful in meta-analyses of a large batch of datasets and can be incorporated in any sequence analysis pipelines in all scales. AVAILABILITY AND IMPLEMENTATION: EARRINGS is open-source software and is available at https://github.com/jhhung/EARRINGS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
RATIONALE: Thermogravimetry (TG) combined with electrospray and atmospheric chemical ionization (ESI+APCI) mass spectrometry (MS) was developed to rapidly characterize thermal decomposition products of synthetic polymers and plastic products. The ESI-based TG-MS method is useful for characterizing thermally labile, nonvolatile, and polar compounds over an extensive mass range; and the APCI-based TG-MS counterpart is useful for characterizing volatile and nonpolar compounds. Both polar and nonpolar compounds can be simultaneously detected by ESI+APCI-based TG-MS. METHODS: Analytes with different volatility were produced from TG operated at different temperatures, which were delivered through a heated stainless-steel tube to the ESI+APCI source where they reacted with the primary charged species generated from electrospray and atmospheric pressure chemical ionization (ESI+APCI) of solvent and nitrogen. The analyte ions were then detected by an ion trap mass spectrometer. RESULTS: A semi-volatile PEG 600 standard was used as the sample and protonated and sodiated molecular ions together with adduct ions including [(PEG)n + 15]+ , [(PEG)n + 18]+ , and [(PEG)n + 29]+ were detected by TG-ESI+APCI-MS. The technique was further utilized to characterize thermal decomposition products of nonvolatile polypropylene glycol (PPG) and polystyrene (PS) standards, as well as a PS-made water cup and coffee cup lid. The characteristic fragments of PPG and PS with mass differences of 58 and 104 between respective ion peaks were detected at the maximum decomposition temperature (Tmax ). CONCLUSIONS: The information obtained from the TG-ESI+APCI-MS analysis is useful in rapidly distinguishing different types of polymers and their products. In addition, the signals of the additives in the polymer products, including antioxidants and plasticizers, were also detected before the TG temperature reached Tmax .
Assuntos
Pressão Atmosférica , Espectrometria de Massas por Ionização por Electrospray , Polímeros , Solventes , Espectrometria de Massas por Ionização por Electrospray/métodos , TermogravimetriaRESUMO
Pluripotency and cell fates can be modulated through the regulation of super-enhancers; however, the underlying mechanisms are unclear. Here, we showed a novel mechanism in which Ash2l directly binds to super-enhancers of several stemness genes to regulate pluripotency and self-renewal in pluripotent stem cells. Ash2l recruits Oct4/Sox2/Nanog (OSN) to form Ash2l/OSN complex at the super-enhancers of Jarid2, Nanog, Sox2 and Oct4, and further drives enhancer activation, upregulation of stemness genes, and maintains the pluripotent circuitry. Ash2l knockdown abrogates the OSN recruitment to all super-enhancers and further hinders the enhancer activation. In addition, CRISPRi/dCas9-mediated blocking of Ash2l-binding motifs at these super-enhancers also prevents OSN recruitment and enhancer activation, validating that Ash2l directly binds to super-enhancers and initiates the pluripotency network. Transfection of Ash2l with W118A mutation to disrupt Ash2l-Oct4 interaction fails to rescue Ash2l-driven enhancer activation and pluripotent gene upregulation in Ash2l-depleted pluripotent stem cells. Together, our data demonstrated Ash2l formed an enhancer-bound Ash2l/OSN complex that can drive enhancer activation, govern pluripotency network and stemness circuitry.
Assuntos
Proteínas de Ligação a DNA/genética , Elementos Facilitadores Genéticos , Células-Tronco Embrionárias Murinas/metabolismo , Fator 3 de Transcrição de Octâmero/genética , Fatores de Transcrição/genética , Animais , Sistemas CRISPR-Cas/genética , Diferenciação Celular/genética , Linhagem da Célula/genética , Autorrenovação Celular/genética , Reprogramação Celular/genética , Elementos Facilitadores Genéticos/genética , Regulação da Expressão Gênica no Desenvolvimento/genética , Humanos , Camundongos , Mutação/genética , Proteína Homeobox Nanog/genética , Células-Tronco Pluripotentes/metabolismo , Fatores de Transcrição SOXB1/genética , TransfecçãoRESUMO
Traumatic brain injury is known to reprogram the epigenome. Chromatin immunoprecipitation-sequencing of histone H3 lysine 27 acetylation (H3K27ac) and tri-methylation of histone H3 at lysine 4 (H3K4me3) marks was performed to address the transcriptional regulation of candidate regeneration-associated genes. In this study, we identify a novel enhancer region for induced WNT3A transcription during regeneration of injured cortical neurons. We further demonstrated an increased mono-methylation of histone H3 at lysine 4 (H3K4me1) modification at this enhancer concomitant with a topological interaction between sub-regions of this enhancer and with promoter of WNT3A gene. Together, this study reports a novel mechanism for WNT3A gene transcription and reveals a potential therapeutic intervention for neuronal regeneration.
Assuntos
Lesões Encefálicas Traumáticas/genética , Histonas/metabolismo , Neurônios/fisiologia , Proteína Wnt3A/genética , Acetilação , Animais , Lesões Encefálicas Traumáticas/metabolismo , Imunoprecipitação da Cromatina , Modelos Animais de Doenças , Elementos Facilitadores Genéticos , Epigênese Genética , Metilação , Neurônios/metabolismo , Regiões Promotoras Genéticas , Ratos , Ratos Sprague-Dawley , RegeneraçãoRESUMO
MOTIVATION: The Full-text index in Minute space (FM-index) derived from the Burrows-Wheeler transform (BWT) is broadly used for fast string matching in large genomes or a huge set of sequencing reads. Several graphic processing unit (GPU) accelerated aligners based on the FM-index have been proposed recently; however, the construction of the index is still handled by central processing unit (CPU), only parallelized in data level (e.g. by performing blockwise suffix sorting in GPU), or not scalable for large genomes. RESULTS: To fulfill the need for a more practical, hardware-parallelizable indexing and matching approach, we herein propose sBWT based on a BWT variant (i.e. Schindler transform) that can be built with highly simplified hardware-acceleration-friendly algorithms and still suffices accurate and fast string matching in repetitive references. In our tests, the implementation achieves significant speedups in indexing and searching compared with other BWT-based tools and can be applied to a variety of domains. AVAILABILITY AND IMPLEMENTATION: sBWT is implemented in C ++ with CPU-only and GPU-accelerated versions. sBWT is open-source software and is available at http://jhhung.github.io/sBWT/Supplementary information: Supplementary data are available at Bioinformatics online. CONTACT: chyee@ntu.edu.tw or jhhung@nctu.edu.tw (also juihunghung@gmail.com).
Assuntos
Algoritmos , Software , Indexação e Redação de Resumos , Genoma , HumanosRESUMO
Small silencing RNAs, including microRNAs, endogenous small interfering RNAs (endo-siRNAs) and Piwi-interacting RNAs (piRNAs), have been shown to play important roles in fine-tuning gene expression, defending virus and controlling transposons. Loss of small silencing RNAs or components in their pathways often leads to severe developmental defects, including lethality and sterility. Recently, non-templated addition of nucleotides to the 3' end, namely tailing, was found to associate with the processing and stability of small silencing RNAs. Next Generation Sequencing has made it possible to detect such modifications at nucleotide resolution in an unprecedented throughput. Unfortunately, detecting such events from millions of short reads confounded by sequencing errors and RNA editing is still a tricky problem. Here, we developed a computational framework, Tailor, driven by an efficient and accurate aligner specifically designed for capturing the tailing events directly from the alignments without extensive post-processing. The performance of Tailor was fully tested and compared favorably with other general-purpose aligners using both simulated and real datasets for tailing analysis. Moreover, to show the broad utility of Tailor, we used Tailor to reanalyze published datasets and revealed novel findings worth further experimental validation. The source code and the executable binaries are freely available at https://github.com/jhhung/Tailor.
Assuntos
Algoritmos , MicroRNAs/química , RNA Interferente Pequeno/química , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Animais , Arabidopsis/genética , Drosophila melanogaster/genética , Células HeLa , Humanos , Software , Peixe-Zebra/genéticaRESUMO
Dynamic exchange of a subset of nucleosomes in vivo plays important roles in epigenetic inheritance of chromatin states, chromatin insulator function, chromosome folding, and the maintenance of the pluripotent state of embryonic stem cells. Here, we extend a pulse-chase strategy for carrying out genome-wide measurements of histone dynamics to several histone variants in murine embryonic stem cells and somatic tissues, recapitulating expected characteristics of the well characterized H3.3 histone variant. We extended this system to the less-studied MacroH2A2 variant, commonly described as a "repressive" histone variant whose accumulation in chromatin is thought to fix the epigenetic state of differentiated cells. Unexpectedly, we found that while large intergenic blocks of MacroH2A2 were stably associated with the genome, promoter-associated peaks of MacroH2A2 exhibited relatively rapid exchange dynamics in ES cells, particularly at highly-transcribed genes. Upon differentiation to embryonic fibroblasts, MacroH2A2 was gained primarily in additional long, stably associated blocks across gene-poor regions, while overall turnover at promoters was greatly dampened. Our results reveal unanticipated dynamic behavior of the MacroH2A2 variant in pluripotent cells, and provide a resource for future studies of tissue-specific histone dynamics in vivo.
Assuntos
Cromatina/genética , Células-Tronco Embrionárias/metabolismo , Epigenômica , Histonas/genética , Animais , Ilhas de CpG/genética , Células-Tronco Embrionárias/citologia , Genoma , Histonas/metabolismo , Camundongos , Nucleossomos/genética , Nucleossomos/metabolismo , Regiões Promotoras GenéticasRESUMO
We have generated optical pulses of 1.2 MW peak power and 0.6 ps duration using a 1060 nm band gain-switched laser diode pulse oscillator. Optical pulses are amplified by three-stage ytterbium-doped fiber amplifiers, and remarkable reductions of amplified spontaneous emission noise and temporal duration have been accomplished based on self-phase modulation in the middle-stage amplifier. After the main amplifier, optical pulses were temporally compressed by a grating pair, and this enabled generation of subpicosecond optical pulses with over 1 MW peak power.
RESUMO
BACKGROUND: In modern paired-end sequencing protocols short DNA fragments lead to adapter-appended reads. Current paired-end adapter removal approaches trim adapter by scanning the fragment of adapter on the 3' end of the reads, which are not competent in some applications. RESULTS: Here, we propose a fast and highly accurate adapter-trimming algorithm, PEAT, designed specifically for paired-end sequencing. PEAT requires no a priori adaptor sequence, which is convenient for large-scale meta-analyses. We assessed the performance of PEAT with many adapter trimmers in both simulated and real life paired-end sequencing libraries. The importance of adapter trimming was exemplified by the influence of the downstream analyses on RNA-seq, ChIP-seq and MNase-seq. Several useful guidelines of applying adapter trimmers with aligners were suggested. CONCLUSIONS: PEAT can be easily included in the routine paired-end sequencing pipeline. The executable binaries and the standalone C++ source code package of PEAT are freely available online.
Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imunoprecipitação da Cromatina , Análise de Sequência de DNA , Análise de Sequência de RNA , Software , Fatores de TempoRESUMO
Understanding the function of individual microRNA (miRNA) species in mice would require the production of hundreds of loss-of-function strains. To accelerate analysis of miRNA biology in mammals, we combined recombinant adeno-associated virus (rAAV) vectors with miRNA 'tough decoys' (TuDs) to inhibit specific miRNAs. Intravenous injection of rAAV9 expressing anti-miR-122 or anti-let-7 TuDs depleted the corresponding miRNA and increased its mRNA targets. rAAV producing anti-miR-122 TuD but not anti-let-7 TuD reduced serum cholesterol by >30% for 25 weeks in wild-type mice. High-throughput sequencing of liver miRNAs from the treated mice confirmed that the targeted miRNAs were depleted and revealed that TuDs induced miRNA tailing and trimming in vivo. rAAV-mediated miRNA inhibition thus provides a simple way to study miRNA function in adult mammals and a potential therapy for dyslipidemia and other diseases caused by miRNA deregulation.
Assuntos
Dependovirus/genética , Vetores Genéticos/genética , MicroRNAs/antagonistas & inibidores , MicroRNAs/genética , Animais , Sequência de Bases , Sítios de Ligação , Linhagem Celular , Colesterol/metabolismo , Fígado/metabolismo , Camundongos , Camundongos Endogâmicos C57BL , MicroRNAs/metabolismo , Dados de Sequência Molecular , RNA Antissenso/genética , RNA Antissenso/metabolismo , Proteínas Recombinantes/genéticaRESUMO
The Encyclopedia of DNA Elements (ENCODE) consortium aims to identify all functional elements in the human genome including transcripts, transcriptional regulatory regions, along with their chromatin states and DNA methylation patterns. The ENCODE project generates data utilizing a variety of techniques that can enrich for regulatory regions, such as chromatin immunoprecipitation (ChIP), micrococcal nuclease (MNase) digestion and DNase I digestion, followed by deeply sequencing the resulting DNA. As part of the ENCODE project, we have developed a Web-accessible repository accessible at http://factorbook.org. In Wiki format, factorbook is a transcription factor (TF)-centric repository of all ENCODE ChIP-seq datasets on TF-binding regions, as well as the rich analysis results of these data. In the first release, factorbook contains 457 ChIP-seq datasets on 119 TFs in a number of human cell lines, the average profiles of histone modifications and nucleosome positioning around the TF-binding regions, sequence motifs enriched in the regions and the distance and orientation preferences between motif sites.
Assuntos
Bases de Dados Genéticas , Elementos Reguladores de Transcrição , Fatores de Transcrição/metabolismo , Sítios de Ligação , Linhagem Celular , Imunoprecipitação da Cromatina , Sequenciamento de Nucleotídeos em Larga Escala , Histonas , Humanos , Internet , Nucleossomos/metabolismo , Motivos de Nucleotídeos , Análise de Sequência de DNARESUMO
Since the initial annotation of miRNAs from cloned short RNAs by the Ambros, Tuschl, and Bartel groups in 2001, more than a hundred studies have sought to identify additional miRNAs in various species. We report here a meta-analysis of short RNA data from Drosophila melanogaster, aggregating published libraries with 76 data sets that we generated for the modENCODE project. In total, we began with more than 1 billion raw reads from 187 libraries comprising diverse developmental stages, specific tissue- and cell-types, mutant conditions, and/or Argonaute immunoprecipitations. We elucidated several features of known miRNA loci, including multiple phased byproducts of cropping and dicing, abundant alternative 5' termini of certain miRNAs, frequent 3' untemplated additions, and potential editing events. We also identified 49 novel genomic locations of miRNA production, and 61 additional candidate loci with limited evidence for miRNA biogenesis. Although these loci broaden the Drosophila miRNA catalog, this work supports the notion that a restricted set of cellular transcripts is competent to be specifically processed by the Drosha/Dicer-1 pathway. Unexpectedly, we detected miRNA production from coding and untranslated regions of mRNAs and found the phenomenon of miRNA production from the antisense strand of known loci to be common. Altogether, this study lays a comprehensive foundation for the study of miRNA diversity and evolution in a complex animal model.
Assuntos
Drosophila melanogaster/genética , Regulação da Expressão Gênica , MicroRNAs/genética , MicroRNAs/metabolismo , Anotação de Sequência Molecular , Animais , Sequência de Bases , Linhagem Celular , Biologia Computacional , Drosophila melanogaster/metabolismo , Feminino , Masculino , MicroRNAs/química , Edição de RNA/genética , RNA Antissenso/química , RNA Antissenso/genética , RNA Mensageiro/química , RNA Mensageiro/genética , Ribonuclease III/genética , Ribonuclease III/metabolismo , Alinhamento de SequênciaRESUMO
A central goal of biology is understanding and describing the molecular basis of plasticity: the sets of genes that are combinatorially selected by exogenous and endogenous environmental changes, and the relations among the genes. The most viable current approach to this problem consists of determining whether sets of genes are connected by some common theme, e.g. genes from the same pathway are overrepresented among those whose differential expression in response to a perturbation is most pronounced. There are many approaches to this problem, and the results they produce show a fair amount of dispersion, but they all fall within a common framework consisting of a few basic components. We critically review these components, suggest best practices for carrying out each step, and propose a voting method for meeting the challenge of assessing different methods on a large number of experimental data sets in the absence of a gold standard.
Assuntos
Biologia Computacional/métodos , Algoritmos , Bases de Dados Genéticas , Expressão Gênica , Guias como Assunto , HumanosRESUMO
BACKGROUND: MicroRNAs (miRNAs) play a critical role in down-regulating gene expression. By coupling with Argonaute family proteins, miRNAs bind to target sites on mRNAs and employ translational repression. A large amount of miRNA-target interactions (MTIs) have been identified by the crosslinking and immunoprecipitation (CLIP) and the photoactivatable-ribonucleoside-enhanced CLIP (PAR-CLIP) along with the next-generation sequencing (NGS). PAR-CLIP shows high efficiency of RNA co-immunoprecipitation, but it also lead to T to C conversion in miRNA-RNA-protein crosslinking regions. This artificial error obviously reduces the mappability of reads. However, a specific tool to analyze CLIP and PAR-CLIP data that takes T to C conversion into account is still in need. RESULTS: We herein propose the first CLIP and PAR-CLIP sequencing analysis platform specifically for miRNA target analysis, namely miRTarCLIP. From scratch, it automatically removes adaptor sequences from raw reads, filters low quality reads, reverts C to T, aligns reads to 3'UTRs, scans for read clusters, identifies high confidence miRNA target sites, and provides annotations from external databases. With multi-threading techniques and our novel C to T reversion procedure, miRTarCLIP greatly reduces the running time comparing to conventional approaches. In addition, miRTarCLIP serves with a web-based interface to provide better user experiences in browsing and searching targets of interested miRNAs. To demonstrate the superior functionality of miRTarCLIP, we applied miRTarCLIP to two public available CLIP and PAR-CLIP sequencing datasets. miRTarCLIP not only shows comparable results to that of other existing tools in a much faster speed, but also reveals interesting features among these putative target sites. Specifically, we used miRTarCLIP to disclose that T to C conversion within position 1-7 and that within position 8-14 of miRNA target sites are significantly different (p value = 0.02), and even more significant when focusing on sites targeted by top 102 highly expressed miRNAs only (p value = 0.01). These results comply with previous findings and further suggest that combining miRNA expression and PAR-CLIP data can improve accuracy of the miRNA target prediction. CONCLUSION: To sum up, we devised a systematic approach for mining miRNA-target sites from CLIP-seq and PAR-CLIP sequencing data, and integrated the workflow with a graphical web-based browser, which provides a user friendly interface and detailed annotations of MTIs. We also showed through real-life examples that miRTarCLIP is a powerful tool for understanding miRNAs. Our integrated tool can be accessed online freely at http://miRTarCLIP.mbc.nctu.edu.tw.