Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Cell ; 151(3): 547-58, 2012 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-23101625

RESUMO

Retroviral overexpression of reprogramming factors (Oct4, Sox2, Klf4, c-Myc) generates induced pluripotent stem cells (iPSCs). However, the integration of foreign DNA could induce genomic dysregulation. Cell-permeant proteins (CPPs) could overcome this limitation. To date, this approach has proved exceedingly inefficient. We discovered a striking difference in the pattern of gene expression induced by viral versus CPP-based delivery of the reprogramming factors, suggesting that a signaling pathway required for efficient nuclear reprogramming was activated by the retroviral, but not CPP approach. In gain- and loss-of-function studies, we find that the toll-like receptor 3 (TLR3) pathway enables efficient induction of pluripotency by viral or mmRNA approaches. Stimulation of TLR3 causes rapid and global changes in the expression of epigenetic modifiers to enhance chromatin remodeling and nuclear reprogramming. Activation of inflammatory pathways are required for efficient nuclear reprogramming in the induction of pluripotency.


Assuntos
Peptídeos Penetradores de Células/metabolismo , Reprogramação Celular , Imunidade Inata , Células-Tronco Pluripotentes Induzidas/metabolismo , Transdução de Sinais , Linhagem Celular , Fibroblastos/metabolismo , Humanos , Inflamação/metabolismo , Fator 4 Semelhante a Kruppel , NF-kappa B/metabolismo , Fator 3 de Transcrição de Octâmero/metabolismo , Retroviridae/metabolismo , Receptor 3 Toll-Like/metabolismo
2.
Bioinformatics ; 38(23): 5245-5252, 2022 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-36250792

RESUMO

MOTIVATION: Clustered regularly interspaced short palindromic repeats (CRISPR)-based genetic perturbation screen is a powerful tool to probe gene function. However, experimental noises, especially for the lowly expressed genes, need to be accounted for to maintain proper control of false positive rate. METHODS: We develop a statistical method, named CRISPR screen with Expression Data Analysis (CEDA), to integrate gene expression profiles and CRISPR screen data for identifying essential genes. CEDA stratifies genes based on expression level and adopts a three-component mixture model for the log-fold change of single-guide RNAs (sgRNAs). Empirical Bayesian prior and expectation-maximization algorithm are used for parameter estimation and false discovery rate inference. RESULTS: Taking advantage of gene expression data, CEDA identifies essential genes with higher expression. Compared to existing methods, CEDA shows comparable reliability but higher sensitivity in detecting essential genes with moderate sgRNA fold change. Therefore, using the same CRISPR data, CEDA generates an additional hit gene list. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Genes Essenciais , Teorema de Bayes , Sistemas CRISPR-Cas , Expressão Gênica , Reprodutibilidade dos Testes , Pequeno RNA não Traduzido/genética
3.
Genome Res ; 29(8): 1329-1342, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31201211

RESUMO

Genome-wide chromatin accessibility and nucleosome occupancy profiles have been widely investigated, while the long-range dynamics remain poorly studied at the single-cell level. Here, we present a new experimental approach, methyltransferase treatment followed by single-molecule long-read sequencing (MeSMLR-seq), for long-range mapping of nucleosomes and chromatin accessibility at single DNA molecules and thus achieve comprehensive-coverage characterization of the corresponding heterogeneity. MeSMLR-seq offers direct measurements of both nucleosome-occupied and nucleosome-evicted regions on a single DNA molecule, which is challenging for many existing methods. We applied MeSMLR-seq to haploid yeast, where single DNA molecules represent single cells, and thus we could investigate the combinatorics of many (up to 356) nucleosomes at long range in single cells. We illustrated the differential organization principles of nucleosomes surrounding the transcription start site for silent and actively transcribed genes, at the single-cell level and in the long-range scale. The heterogeneous patterns of chromatin status spanning multiple genes were phased. Together with single-cell RNA-seq data, we quantitatively revealed how chromatin accessibility correlated with gene transcription positively in a highly heterogeneous scenario. Moreover, we quantified the openness of promoters and investigated the coupled chromatin changes of adjacent genes at single DNA molecules during transcription reprogramming. In addition, we revealed the coupled changes of chromatin accessibility for two neighboring glucose transporter genes in response to changes in glucose concentration.


Assuntos
Eucromatina/metabolismo , Regulação Fúngica da Expressão Gênica , Histonas/genética , Saccharomyces cerevisiae/genética , Transcrição Gênica , Mapeamento Cromossômico , DNA Fúngico/genética , DNA Fúngico/metabolismo , Eucromatina/química , Glucose/metabolismo , Proteínas Facilitadoras de Transporte de Glucose/genética , Proteínas Facilitadoras de Transporte de Glucose/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Histonas/metabolismo , Metiltransferases/química , Proteínas de Transporte de Monossacarídeos/genética , Proteínas de Transporte de Monossacarídeos/metabolismo , Nucleossomos/química , Nucleossomos/metabolismo , Regiões Promotoras Genéticas , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Análise de Célula Única/métodos , Sítio de Iniciação de Transcrição
4.
Bioinformatics ; 37(Suppl_1): i477-i483, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252938

RESUMO

MOTIVATION: Oxford Nanopore Technologies sequencing devices support adaptive sequencing, in which undesired reads can be ejected from a pore in real time. This feature allows targeted sequencing aided by computational methods for mapping partial reads, rather than complex library preparation protocols. However, existing mapping methods either require a computationally expensive base-calling procedure before using aligners to map partial reads or work well only on small genomes. RESULTS: In this work, we present a new streaming method that can map nanopore raw signals for real-time selective sequencing. Rather than converting read signals to bases, we propose to convert reference genomes to signals and fully operate in the signal space. Our method features a new way to index reference genomes using k-d trees, a novel seed selection strategy and a seed chaining algorithm tailored toward the current signal characteristics. We implemented the method as a tool Sigmap. Then we evaluated it on both simulated and real data and compared it to the state-of-the-art nanopore raw signal mapper Uncalled. Our results show that Sigmap yields comparable performance on mapping yeast simulated raw signals, and better mapping accuracy on mapping yeast real raw signals with a 4.4× speedup. Moreover, our method performed well on mapping raw signals to genomes of size >100 Mbp and correctly mapped 11.49% more real raw signals of green algae, which leads to a significantly higher F1-score (0.9354 versus 0.8660). AVAILABILITY AND IMPLEMENTATION: Sigmap code is accessible at https://github.com/haowenz/sigmap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Nanoporos , Algoritmos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software
5.
Proc Natl Acad Sci U S A ; 116(35): 17470-17479, 2019 08 27.
Artigo em Inglês | MEDLINE | ID: mdl-31395738

RESUMO

The most frequently mutated protein in human cancer is p53, a transcription factor (TF) that regulates myriad genes instrumental in diverse cellular outcomes including growth arrest and cell death. Cell context-dependent p53 modulation is critical for this life-or-death balance, yet remains incompletely understood. Here we identify sequence signatures enriched in genomic p53-binding sites modulated by the transcription cofactor iASPP. Moreover, our p53-iASPP crystal structure reveals that iASPP displaces the p53 L1 loop-which mediates sequence-specific interactions with the signature-corresponding base-without perturbing other DNA-recognizing modules of the p53 DNA-binding domain. A TF commonly uses multiple structural modules to recognize its cognate DNA, and thus this mechanism of a cofactor fine-tuning TF-DNA interactions through targeting a particular module is likely widespread. Previously, all tumor suppressors and oncoproteins that associate with the p53 DNA-binding domain-except the oncogenic E6 from human papillomaviruses (HPVs)-structurally cluster at the DNA-binding site of p53, complicating drug design. By contrast, iASPP inhibits p53 through a distinct surface overlapping the E6 footprint, opening prospects for p53-targeting precision medicine to improve cancer therapy.


Assuntos
DNA/genética , DNA/metabolismo , Peptídeos e Proteínas de Sinalização Intracelular/metabolismo , Proteínas Repressoras/metabolismo , Elementos de Resposta , Proteína Supressora de Tumor p53/metabolismo , Sequência de Bases , Sítios de Ligação , Linhagem Celular Tumoral , DNA/química , Perfilação da Expressão Gênica , Humanos , Peptídeos e Proteínas de Sinalização Intracelular/química , Modelos Moleculares , Motivos de Nucleotídeos , Proteínas Oncogênicas Virais/química , Proteínas Oncogênicas Virais/metabolismo , Ligação Proteica , Conformação Proteica , Proteínas Repressoras/química , Relação Estrutura-Atividade , Proteína Supressora de Tumor p53/química
6.
Brief Bioinform ; 20(6): 2306-2315, 2019 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-30239581

RESUMO

The intra-tumor heterogeneity is associated with cancer progression and therapeutic resistance, such as in breast cancer. While the existing methods for studying tumor heterogeneity only analyze variant allele frequency (VAF), the genotype of variant is also informative for inferring subclones, which can be detected by long reads or paired-end reads. We developed GenoClone to integrate VAF with the genotype of variant innovatively, so it showed superior performance of inferring the number of subclones, estimating the fractions of subclones and identifying somatic single-nucleotide variants composition of subclones. When GenoClone was applied to 389 TCGA breast cancer samples, it revealed extensive intra-tumor heterogeneity. We further found that a few somatic mutations were relevant to the late stage of tumor evolution, including the ones at the oncogene PIK3CA and the tumor suppress gene TP53. Moreover, 52 subclones that were identified from 167 samples shared high similarity of somatic mutations, which were clustered into three groups with the sizes of 24, 14 and 14. It is helpful for understanding the development of breast cancer in certain subgroups of people and the drug development for population level. Furthermore, GenoClone also identified the tumor heterogeneity in different aliquots of the same samples. The implementation of GenoClone is available at http://www.healthcare.uiowa.edu/labs/au/GenoClone/.


Assuntos
Neoplasias da Mama/patologia , Ligação Genética , Mutação em Linhagem Germinativa , Neoplasias da Mama/genética , Classe I de Fosfatidilinositol 3-Quinases/genética , Feminino , Genótipo , Humanos , Método de Monte Carlo , Polimorfismo de Nucleotídeo Único , Proteína Supressora de Tumor p53/genética
7.
Bioinformatics ; 34(13): 2168-2176, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29905763

RESUMO

Motivation: In the past years, the long read (LR) sequencing technologies, such as Pacific Biosciences and Oxford Nanopore Technologies, have been demonstrated to substantially improve the quality of genome assembly and transcriptome characterization. Compared to the high cost of genome assembly by LR sequencing, it is more affordable to generate LRs for transcriptome characterization. That is, when informative transcriptome LR data are available without a high-quality genome, a method for de novo transcriptome assembly and annotation is of high demand. Results: Without a reference genome, IDP-denovo performs de novo transcriptome assembly, isoform annotation and quantification by integrating the strengths of LRs and short reads. Using the GM12878 human data as a gold standard, we demonstrated that IDP-denovo had superior sensitivity of transcript assembly and high accuracy of isoform annotation. In addition, IDP-denovo outputs two abundance indices to provide a comprehensive expression profile of genes/isoforms. IDP-denovo represents a robust approach for transcriptome assembly, isoform annotation and quantification for non-model organism studies. Applying IDP-denovo to a non-model organism, Dendrobium officinale, we discovered a number of novel genes and novel isoforms that were not reported by the existing annotation library. These results reveal the high diversity of gene isoforms in D.officinale, which was not reported in the existing annotation library. Availability and implementation: The dataset of Dendrobium officinale used/analyzed during the current study has been deposited in SRA, with accession code SRP094520. IDP-denovo is available for download at www.healthcare.uiowa.edu/labs/au/IDP-denovo/. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento Alternativo , Perfilação da Expressão Gênica/métodos , Biblioteca Gênica , Dendrobium/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de RNA/métodos
8.
Nucleic Acids Res ; 45(5): e32, 2017 03 17.
Artigo em Inglês | MEDLINE | ID: mdl-27899656

RESUMO

Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only.


Assuntos
Alelos , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Isoformas de RNA/genética , RNA Mensageiro/genética , Transcriptoma , Regulação da Expressão Gênica , Células-Tronco Embrionárias Humanas/citologia , Células-Tronco Embrionárias Humanas/metabolismo , Humanos , Células MCF-7 , Isoformas de RNA/metabolismo , RNA Mensageiro/metabolismo , Análise de Sequência de RNA
9.
Nucleic Acids Res ; 43(18): e116, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26040699

RESUMO

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes.


Assuntos
Carcinogênese/genética , Perfilação da Expressão Gênica , Fusão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Feminino , Humanos , Células MCF-7 , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Alinhamento de Sequência
10.
Plant J ; 82(6): 951-961, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25912611

RESUMO

Danshen, Salvia miltiorrhiza Bunge, is one of the most widely used herbs in traditional Chinese medicine, wherein its rhizome/roots are particularly valued. The corresponding bioactive components include the tanshinone diterpenoids, the biosynthesis of which is a subject of considerable interest. Previous investigations of the S. miltiorrhiza transcriptome have relied on short-read next-generation sequencing (NGS) technology, and the vast majority of the resulting isotigs do not represent full-length cDNA sequences. Moreover, these efforts have been targeted at either whole plants or hairy root cultures. Here, we demonstrate that the tanshinone pigments are produced and accumulate in the root periderm, and apply a combination of NGS and single-molecule real-time (SMRT) sequencing to various root tissues, particularly including the periderm, to provide a more complete view of the S. miltiorrhiza transcriptome, with further insight into tanshinone biosynthesis as well. In addition, the use of SMRT long-read sequencing offered the ability to examine alternative splicing, which was found to occur in approximately 40% of the detected gene loci, including several involved in isoprenoid/terpenoid metabolism.


Assuntos
Abietanos/biossíntese , Processamento Alternativo , Raízes de Plantas/genética , Salvia miltiorrhiza/genética , Abietanos/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica de Plantas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Raízes de Plantas/metabolismo , Salvia miltiorrhiza/metabolismo , Análise de Sequência de DNA/métodos , Transcriptoma
11.
Genome Res ; 23(1): 201-16, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22960373

RESUMO

The Xenopus embryo has provided key insights into fate specification, the cell cycle, and other fundamental developmental and cellular processes, yet a comprehensive understanding of its transcriptome is lacking. Here, we used paired end RNA sequencing (RNA-seq) to explore the transcriptome of Xenopus tropicalis in 23 distinct developmental stages. We determined expression levels of all genes annotated in RefSeq and Ensembl and showed for the first time on a genome-wide scale that, despite a general state of transcriptional silence in the earliest stages of development, approximately 150 genes are transcribed prior to the midblastula transition. In addition, our splicing analysis uncovered more than 10,000 novel splice junctions at each stage and revealed that many known genes have additional unannotated isoforms. Furthermore, we used Cufflinks to reconstruct transcripts from our RNA-seq data and found that ∼13.5% of the final contigs are derived from novel transcribed regions, both within introns and in intergenic regions. We then developed a filtering pipeline to separate protein-coding transcripts from noncoding RNAs and identified a confident set of 6686 noncoding transcripts in 3859 genomic loci. Since the current reference genome, XenTro3, consists of hundreds of scaffolds instead of full chromosomes, we also performed de novo reconstruction of the transcriptome using Trinity and uncovered hundreds of transcripts that are missing from the genome. Collectively, our data will not only aid in completing the assembly of the Xenopus tropicalis genome but will also serve as a valuable resource for gene discovery and for unraveling the fundamental mechanisms of vertebrate embryogenesis.


Assuntos
Regulação da Expressão Gênica no Desenvolvimento , Análise de Sequência de RNA , Transcriptoma , Xenopus/genética , Animais , Ectima Contagioso , Embrião não Mamífero/metabolismo , Íntrons , Larva/genética , Larva/metabolismo , Mapeamento Físico do Cromossomo , Splicing de RNA , RNA não Traduzido , Alinhamento de Sequência , Xenopus/crescimento & desenvolvimento
12.
Proc Natl Acad Sci U S A ; 110(50): E4821-30, 2013 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-24282307

RESUMO

Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.


Assuntos
Processamento Alternativo/genética , Células-Tronco Embrionárias/metabolismo , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Isoformas de Proteínas/genética , Transcriptoma/genética , Células-Tronco Embrionárias/química , Humanos , Masculino
13.
Mol Syst Biol ; 9: 632, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23295861

RESUMO

Landmark events occur in a coordinated manner during pre-implantation development of the mammalian embryo, yet the regulatory network that orchestrates these events remains largely unknown. Here, we present the first systematic investigation of the network in pre-implantation mouse embryos using morpholino-mediated gene knockdowns of key embryonic stem cell (ESC) factors followed by detailed transcriptome analysis of pooled embryos, single embryos, and individual blastomeres. We delineated the regulons of Oct4, Sall4, and Nanog and identified a set of metabolism- and transport-related genes that were controlled by these transcription factors in embryos but not in ESCs. Strikingly, the knockdown embryos arrested at a range of developmental stages. We provided evidence that the DNA methyltransferase Dnmt3b has a role in determining the extent to which a knockdown embryo can develop. We further showed that the feed-forward loop comprising Dnmt3b, the pluripotency factors, and the miR-290-295 cluster exemplifies a network motif that buffers embryos against gene expression noise. Our findings indicate that Oct4, Sall4, and Nanog form a robust and integrated network to govern mammalian pre-implantation development.


Assuntos
Blastocisto/fisiologia , Proteínas de Ligação a DNA/genética , Células-Tronco Embrionárias/fisiologia , Redes Reguladoras de Genes , Proteínas de Homeodomínio/genética , Fator 3 de Transcrição de Octâmero/genética , Fatores de Transcrição/genética , Animais , Blastocisto/metabolismo , DNA (Citosina-5-)-Metiltransferases/genética , DNA (Citosina-5-)-Metiltransferases/metabolismo , Proteínas de Ligação a DNA/metabolismo , Técnicas de Cultura Embrionária , Embrião de Mamíferos/metabolismo , Desenvolvimento Embrionário , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Técnicas de Silenciamento de Genes , Proteínas de Homeodomínio/metabolismo , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos DBA , MicroRNAs/genética , Proteína Homeobox Nanog , Fator 3 de Transcrição de Octâmero/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Fatores de Transcrição/metabolismo , DNA Metiltransferase 3B
14.
Nat Biotechnol ; 42(4): 591-596, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37349523

RESUMO

Current N6-methyladenosine (m6A) mapping methods need large amounts of RNA or are limited to cultured cells. Through optimized sample recovery and signal-to-noise ratio, we developed picogram-scale m6A RNA immunoprecipitation and sequencing (picoMeRIP-seq) for studying m6A in vivo in single cells and scarce cell types using standard laboratory equipment. We benchmark m6A mapping on titrations of poly(A) RNA and embryonic stem cells and in single zebrafish zygotes, mouse oocytes and embryos.


Assuntos
RNA , Peixe-Zebra , Animais , Camundongos , Peixe-Zebra/genética , Peixe-Zebra/metabolismo , RNA/genética , RNA Mensageiro/genética , Células-Tronco Embrionárias , Células Cultivadas
15.
Nat Struct Mol Biol ; 30(5): 703-709, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37081317

RESUMO

Despite the significance of N6-methyladenosine (m6A) in gene regulation, the requirement for large amounts of RNA has hindered m6A profiling in mammalian early embryos. Here we apply low-input methyl RNA immunoprecipitation and sequencing to map m6A in mouse oocytes and preimplantation embryos. We define the landscape of m6A during the maternal-to-zygotic transition, including stage-specifically expressed transcription factors essential for cell fate determination. Both the maternally inherited transcripts to be degraded post fertilization and the zygotically activated genes during zygotic genome activation are widely marked by m6A. In contrast to m6A-marked zygotic ally-activated genes, m6A-marked maternally inherited transcripts have a higher tendency to be targeted by microRNAs. Moreover, RNAs derived from retrotransposons, such as MTA that is maternally expressed and MERVL that is transcriptionally activated at the two-cell stage, are largely marked by m6A. Our results provide a foundation for future studies exploring the regulatory roles of m6A in mammalian early embryonic development.


Assuntos
Regulação da Expressão Gênica no Desenvolvimento , MicroRNAs , Animais , Camundongos , Blastocisto , Oócitos/metabolismo , Desenvolvimento Embrionário/genética , Zigoto , MicroRNAs/metabolismo , Mamíferos/genética
16.
Nucleic Acids Res ; 38(14): 4570-8, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20371516

RESUMO

Alternative splicing is a prevalent post-transcriptional process, which is not only important to normal cellular function but is also involved in human diseases. The newly developed second generation sequencing technique provides high-throughput data (RNA-seq data) to study alternative splicing events in different types of cells. Here, we present a computational method, SpliceMap, to detect splice junctions from RNA-seq data. This method does not depend on any existing annotation of gene structures and is capable of finding novel splice junctions with high sensitivity and specificity. It can handle long reads (50-100 nt) and can exploit paired-read information to improve mapping accuracy. Several parameters are included in the output to indicate the reliability of the predicted junction and help filter out false predictions. We applied SpliceMap to analyze 23 million paired 50-nt reads from human brain tissue. The results show at this depth of sequencing, RNA-seq can support reliable detection of splice junctions except for those that are present at very low level. Compared to current methods, SpliceMap can achieve 12% higher sensitivity without sacrificing specificity.


Assuntos
Processamento Alternativo , Sítios de Splice de RNA , Análise de Sequência de RNA , Software , Algoritmos , Biologia Computacional/métodos , Humanos , Reação em Cadeia da Polimerase
17.
Nat Biotechnol ; 39(11): 1348-1365, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34750572

RESUMO

Rapid advances in nanopore technologies for sequencing single long DNA and RNA molecules have led to substantial improvements in accuracy, read length and throughput. These breakthroughs have required extensive development of experimental and bioinformatics methods to fully exploit nanopore long reads for investigations of genomes, transcriptomes, epigenomes and epitranscriptomes. Nanopore sequencing is being applied in genome assembly, full-length transcript detection and base modification detection and in more specialized areas, such as rapid clinical diagnoses and outbreak surveillance. Many opportunities remain for improving data quality and analytical approaches through the development of new nanopores, base-calling methods and experimental protocols tailored to particular applications.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Tecnologia
18.
Nat Commun ; 12(1): 1361, 2021 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-33649327

RESUMO

Sperm contributes diverse RNAs to the zygote. While sperm small RNAs have been shown to impact offspring phenotypes, our knowledge of the sperm transcriptome, especially the composition of long RNAs, has been limited by the lack of sensitive, high-throughput experimental techniques that can distinguish intact RNAs from fragmented RNAs, known to abound in sperm. Here, we integrate single-molecule long-read sequencing with short-read sequencing to detect sperm intact RNAs (spiRNAs). We identify 3440 spiRNA species in mice and 4100 in humans. The spiRNA profile consists of both mRNAs and long non-coding RNAs, is evolutionarily conserved between mice and humans, and displays an enrichment in mRNAs encoding for ribosome. In sum, we characterize the landscape of intact long RNAs in sperm, paving the way for future studies on their biogenesis and functions. Our experimental and bioinformatics approaches can be applied to other tissues and organisms to detect intact transcripts.


Assuntos
Sequência Conservada/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA/genética , Imagem Individual de Molécula , Espermatozoides/metabolismo , Animais , Evolução Molecular , Ontologia Genética , Humanos , Masculino , Camundongos Endogâmicos C57BL , RNA/metabolismo , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Ribossomos/metabolismo , Testículo/metabolismo , Transcriptoma/genética
19.
Genome Biol ; 21(1): 14, 2020 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-31952552

RESUMO

The error-prone third-generation sequencing (TGS) long reads can be corrected by the high-quality second-generation sequencing (SGS) short reads, which is referred to as hybrid error correction. We here investigate the influences of the principal algorithmic factors of two major types of hybrid error correction methods by mathematical modeling and analysis on both simulated and real data. Our study reveals the distribution of accuracy gain with respect to the original long read error rate. We also demonstrate that the original error rate of 19% is the limit for perfect correction, beyond which long reads are too error-prone to be corrected by these methods.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência , Algoritmos
20.
Genome Biol ; 20(1): 26, 2019 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-30717772

RESUMO

BACKGROUND: Third-generation sequencing technologies have advanced the progress of the biological research by generating reads that are substantially longer than second-generation sequencing technologies. However, their notorious high error rate impedes straightforward data analysis and limits their application. A handful of error correction methods for these error-prone long reads have been developed to date. The output data quality is very important for downstream analysis, whereas computing resources could limit the utility of some computing-intense tools. There is a lack of standardized assessments for these long-read error-correction methods. RESULTS: Here, we present a comparative performance assessment of ten state-of-the-art error-correction methods for long reads. We established a common set of benchmarks for performance assessment, including sensitivity, accuracy, output rate, alignment rate, output read length, run time, and memory usage, as well as the effects of error correction on two downstream applications of long reads: de novo assembly and resolving haplotype sequences. CONCLUSIONS: Taking into account all of these metrics, we provide a suggestive guideline for method choice based on available data size, computing resources, and individual research goals.


Assuntos
Genômica/métodos , Análise de Sequência de DNA , Software/estatística & dados numéricos , Animais , Arabidopsis , Drosophila melanogaster , Escherichia coli , Saccharomyces cerevisiae , Erro Científico Experimental , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA