Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 151(3): 547-58, 2012 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-23101625

RESUMEN

Retroviral overexpression of reprogramming factors (Oct4, Sox2, Klf4, c-Myc) generates induced pluripotent stem cells (iPSCs). However, the integration of foreign DNA could induce genomic dysregulation. Cell-permeant proteins (CPPs) could overcome this limitation. To date, this approach has proved exceedingly inefficient. We discovered a striking difference in the pattern of gene expression induced by viral versus CPP-based delivery of the reprogramming factors, suggesting that a signaling pathway required for efficient nuclear reprogramming was activated by the retroviral, but not CPP approach. In gain- and loss-of-function studies, we find that the toll-like receptor 3 (TLR3) pathway enables efficient induction of pluripotency by viral or mmRNA approaches. Stimulation of TLR3 causes rapid and global changes in the expression of epigenetic modifiers to enhance chromatin remodeling and nuclear reprogramming. Activation of inflammatory pathways are required for efficient nuclear reprogramming in the induction of pluripotency.


Asunto(s)
Péptidos de Penetración Celular/metabolismo , Reprogramación Celular , Inmunidad Innata , Células Madre Pluripotentes Inducidas/metabolismo , Transducción de Señal , Línea Celular , Fibroblastos/metabolismo , Humanos , Inflamación/metabolismo , Factor 4 Similar a Kruppel , FN-kappa B/metabolismo , Factor 3 de Transcripción de Unión a Octámeros/metabolismo , Retroviridae/metabolismo , Receptor Toll-Like 3/metabolismo
2.
Bioinformatics ; 38(23): 5245-5252, 2022 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-36250792

RESUMEN

MOTIVATION: Clustered regularly interspaced short palindromic repeats (CRISPR)-based genetic perturbation screen is a powerful tool to probe gene function. However, experimental noises, especially for the lowly expressed genes, need to be accounted for to maintain proper control of false positive rate. METHODS: We develop a statistical method, named CRISPR screen with Expression Data Analysis (CEDA), to integrate gene expression profiles and CRISPR screen data for identifying essential genes. CEDA stratifies genes based on expression level and adopts a three-component mixture model for the log-fold change of single-guide RNAs (sgRNAs). Empirical Bayesian prior and expectation-maximization algorithm are used for parameter estimation and false discovery rate inference. RESULTS: Taking advantage of gene expression data, CEDA identifies essential genes with higher expression. Compared to existing methods, CEDA shows comparable reliability but higher sensitivity in detecting essential genes with moderate sgRNA fold change. Therefore, using the same CRISPR data, CEDA generates an additional hit gene list. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Genes Esenciales , Teorema de Bayes , Sistemas CRISPR-Cas , Expresión Génica , Reproducibilidad de los Resultados , ARN Pequeño no Traducido/genética
3.
Genome Res ; 29(8): 1329-1342, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31201211

RESUMEN

Genome-wide chromatin accessibility and nucleosome occupancy profiles have been widely investigated, while the long-range dynamics remain poorly studied at the single-cell level. Here, we present a new experimental approach, methyltransferase treatment followed by single-molecule long-read sequencing (MeSMLR-seq), for long-range mapping of nucleosomes and chromatin accessibility at single DNA molecules and thus achieve comprehensive-coverage characterization of the corresponding heterogeneity. MeSMLR-seq offers direct measurements of both nucleosome-occupied and nucleosome-evicted regions on a single DNA molecule, which is challenging for many existing methods. We applied MeSMLR-seq to haploid yeast, where single DNA molecules represent single cells, and thus we could investigate the combinatorics of many (up to 356) nucleosomes at long range in single cells. We illustrated the differential organization principles of nucleosomes surrounding the transcription start site for silent and actively transcribed genes, at the single-cell level and in the long-range scale. The heterogeneous patterns of chromatin status spanning multiple genes were phased. Together with single-cell RNA-seq data, we quantitatively revealed how chromatin accessibility correlated with gene transcription positively in a highly heterogeneous scenario. Moreover, we quantified the openness of promoters and investigated the coupled chromatin changes of adjacent genes at single DNA molecules during transcription reprogramming. In addition, we revealed the coupled changes of chromatin accessibility for two neighboring glucose transporter genes in response to changes in glucose concentration.


Asunto(s)
Eucromatina/metabolismo , Regulación Fúngica de la Expresión Génica , Histonas/genética , Saccharomyces cerevisiae/genética , Transcripción Genética , Mapeo Cromosómico , ADN de Hongos/genética , ADN de Hongos/metabolismo , Eucromatina/química , Glucosa/metabolismo , Proteínas Facilitadoras del Transporte de la Glucosa/genética , Proteínas Facilitadoras del Transporte de la Glucosa/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Histonas/metabolismo , Metiltransferasas/química , Proteínas de Transporte de Monosacáridos/genética , Proteínas de Transporte de Monosacáridos/metabolismo , Nucleosomas/química , Nucleosomas/metabolismo , Regiones Promotoras Genéticas , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Análisis de la Célula Individual/métodos , Sitio de Iniciación de la Transcripción
4.
Bioinformatics ; 37(Suppl_1): i477-i483, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-34252938

RESUMEN

MOTIVATION: Oxford Nanopore Technologies sequencing devices support adaptive sequencing, in which undesired reads can be ejected from a pore in real time. This feature allows targeted sequencing aided by computational methods for mapping partial reads, rather than complex library preparation protocols. However, existing mapping methods either require a computationally expensive base-calling procedure before using aligners to map partial reads or work well only on small genomes. RESULTS: In this work, we present a new streaming method that can map nanopore raw signals for real-time selective sequencing. Rather than converting read signals to bases, we propose to convert reference genomes to signals and fully operate in the signal space. Our method features a new way to index reference genomes using k-d trees, a novel seed selection strategy and a seed chaining algorithm tailored toward the current signal characteristics. We implemented the method as a tool Sigmap. Then we evaluated it on both simulated and real data and compared it to the state-of-the-art nanopore raw signal mapper Uncalled. Our results show that Sigmap yields comparable performance on mapping yeast simulated raw signals, and better mapping accuracy on mapping yeast real raw signals with a 4.4× speedup. Moreover, our method performed well on mapping raw signals to genomes of size >100 Mbp and correctly mapped 11.49% more real raw signals of green algae, which leads to a significantly higher F1-score (0.9354 versus 0.8660). AVAILABILITY AND IMPLEMENTATION: Sigmap code is accessible at https://github.com/haowenz/sigmap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Nanoporos , Algoritmos , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Programas Informáticos
5.
Proc Natl Acad Sci U S A ; 116(35): 17470-17479, 2019 08 27.
Artículo en Inglés | MEDLINE | ID: mdl-31395738

RESUMEN

The most frequently mutated protein in human cancer is p53, a transcription factor (TF) that regulates myriad genes instrumental in diverse cellular outcomes including growth arrest and cell death. Cell context-dependent p53 modulation is critical for this life-or-death balance, yet remains incompletely understood. Here we identify sequence signatures enriched in genomic p53-binding sites modulated by the transcription cofactor iASPP. Moreover, our p53-iASPP crystal structure reveals that iASPP displaces the p53 L1 loop-which mediates sequence-specific interactions with the signature-corresponding base-without perturbing other DNA-recognizing modules of the p53 DNA-binding domain. A TF commonly uses multiple structural modules to recognize its cognate DNA, and thus this mechanism of a cofactor fine-tuning TF-DNA interactions through targeting a particular module is likely widespread. Previously, all tumor suppressors and oncoproteins that associate with the p53 DNA-binding domain-except the oncogenic E6 from human papillomaviruses (HPVs)-structurally cluster at the DNA-binding site of p53, complicating drug design. By contrast, iASPP inhibits p53 through a distinct surface overlapping the E6 footprint, opening prospects for p53-targeting precision medicine to improve cancer therapy.


Asunto(s)
ADN/genética , ADN/metabolismo , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Proteínas Represoras/metabolismo , Elementos de Respuesta , Proteína p53 Supresora de Tumor/metabolismo , Secuencia de Bases , Sitios de Unión , Línea Celular Tumoral , ADN/química , Perfilación de la Expresión Génica , Humanos , Péptidos y Proteínas de Señalización Intracelular/química , Modelos Moleculares , Motivos de Nucleótidos , Proteínas Oncogénicas Virales/química , Proteínas Oncogénicas Virales/metabolismo , Unión Proteica , Conformación Proteica , Proteínas Represoras/química , Relación Estructura-Actividad , Proteína p53 Supresora de Tumor/química
6.
Brief Bioinform ; 20(6): 2306-2315, 2019 11 27.
Artículo en Inglés | MEDLINE | ID: mdl-30239581

RESUMEN

The intra-tumor heterogeneity is associated with cancer progression and therapeutic resistance, such as in breast cancer. While the existing methods for studying tumor heterogeneity only analyze variant allele frequency (VAF), the genotype of variant is also informative for inferring subclones, which can be detected by long reads or paired-end reads. We developed GenoClone to integrate VAF with the genotype of variant innovatively, so it showed superior performance of inferring the number of subclones, estimating the fractions of subclones and identifying somatic single-nucleotide variants composition of subclones. When GenoClone was applied to 389 TCGA breast cancer samples, it revealed extensive intra-tumor heterogeneity. We further found that a few somatic mutations were relevant to the late stage of tumor evolution, including the ones at the oncogene PIK3CA and the tumor suppress gene TP53. Moreover, 52 subclones that were identified from 167 samples shared high similarity of somatic mutations, which were clustered into three groups with the sizes of 24, 14 and 14. It is helpful for understanding the development of breast cancer in certain subgroups of people and the drug development for population level. Furthermore, GenoClone also identified the tumor heterogeneity in different aliquots of the same samples. The implementation of GenoClone is available at http://www.healthcare.uiowa.edu/labs/au/GenoClone/.


Asunto(s)
Neoplasias de la Mama/patología , Ligamiento Genético , Mutación de Línea Germinal , Neoplasias de la Mama/genética , Fosfatidilinositol 3-Quinasa Clase I/genética , Femenino , Genotipo , Humanos , Método de Montecarlo , Polimorfismo de Nucleótido Simple , Proteína p53 Supresora de Tumor/genética
7.
Bioinformatics ; 34(13): 2168-2176, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29905763

RESUMEN

Motivation: In the past years, the long read (LR) sequencing technologies, such as Pacific Biosciences and Oxford Nanopore Technologies, have been demonstrated to substantially improve the quality of genome assembly and transcriptome characterization. Compared to the high cost of genome assembly by LR sequencing, it is more affordable to generate LRs for transcriptome characterization. That is, when informative transcriptome LR data are available without a high-quality genome, a method for de novo transcriptome assembly and annotation is of high demand. Results: Without a reference genome, IDP-denovo performs de novo transcriptome assembly, isoform annotation and quantification by integrating the strengths of LRs and short reads. Using the GM12878 human data as a gold standard, we demonstrated that IDP-denovo had superior sensitivity of transcript assembly and high accuracy of isoform annotation. In addition, IDP-denovo outputs two abundance indices to provide a comprehensive expression profile of genes/isoforms. IDP-denovo represents a robust approach for transcriptome assembly, isoform annotation and quantification for non-model organism studies. Applying IDP-denovo to a non-model organism, Dendrobium officinale, we discovered a number of novel genes and novel isoforms that were not reported by the existing annotation library. These results reveal the high diversity of gene isoforms in D.officinale, which was not reported in the existing annotation library. Availability and implementation: The dataset of Dendrobium officinale used/analyzed during the current study has been deposited in SRA, with accession code SRP094520. IDP-denovo is available for download at www.healthcare.uiowa.edu/labs/au/IDP-denovo/. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Empalme Alternativo , Perfilación de la Expresión Génica/métodos , Biblioteca de Genes , Dendrobium/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Análisis de Secuencia de ARN/métodos
8.
Nucleic Acids Res ; 45(5): e32, 2017 03 17.
Artículo en Inglés | MEDLINE | ID: mdl-27899656

RESUMEN

Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only.


Asunto(s)
Alelos , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Isoformas de ARN/genética , ARN Mensajero/genética , Transcriptoma , Regulación de la Expresión Génica , Células Madre Embrionarias Humanas/citología , Células Madre Embrionarias Humanas/metabolismo , Humanos , Células MCF-7 , Isoformas de ARN/metabolismo , ARN Mensajero/metabolismo , Análisis de Secuencia de ARN
9.
Nucleic Acids Res ; 43(18): e116, 2015 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-26040699

RESUMEN

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes.


Asunto(s)
Carcinogénesis/genética , Perfilación de la Expresión Génica , Fusión Génica , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Femenino , Humanos , Células MCF-7 , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Alineación de Secuencia
10.
Plant J ; 82(6): 951-961, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-25912611

RESUMEN

Danshen, Salvia miltiorrhiza Bunge, is one of the most widely used herbs in traditional Chinese medicine, wherein its rhizome/roots are particularly valued. The corresponding bioactive components include the tanshinone diterpenoids, the biosynthesis of which is a subject of considerable interest. Previous investigations of the S. miltiorrhiza transcriptome have relied on short-read next-generation sequencing (NGS) technology, and the vast majority of the resulting isotigs do not represent full-length cDNA sequences. Moreover, these efforts have been targeted at either whole plants or hairy root cultures. Here, we demonstrate that the tanshinone pigments are produced and accumulate in the root periderm, and apply a combination of NGS and single-molecule real-time (SMRT) sequencing to various root tissues, particularly including the periderm, to provide a more complete view of the S. miltiorrhiza transcriptome, with further insight into tanshinone biosynthesis as well. In addition, the use of SMRT long-read sequencing offered the ability to examine alternative splicing, which was found to occur in approximately 40% of the detected gene loci, including several involved in isoprenoid/terpenoid metabolism.


Asunto(s)
Abietanos/biosíntesis , Empalme Alternativo , Raíces de Plantas/genética , Salvia miltiorrhiza/genética , Abietanos/metabolismo , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica de las Plantas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Raíces de Plantas/metabolismo , Salvia miltiorrhiza/metabolismo , Análisis de Secuencia de ADN/métodos , Transcriptoma
11.
Genome Res ; 23(1): 201-16, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22960373

RESUMEN

The Xenopus embryo has provided key insights into fate specification, the cell cycle, and other fundamental developmental and cellular processes, yet a comprehensive understanding of its transcriptome is lacking. Here, we used paired end RNA sequencing (RNA-seq) to explore the transcriptome of Xenopus tropicalis in 23 distinct developmental stages. We determined expression levels of all genes annotated in RefSeq and Ensembl and showed for the first time on a genome-wide scale that, despite a general state of transcriptional silence in the earliest stages of development, approximately 150 genes are transcribed prior to the midblastula transition. In addition, our splicing analysis uncovered more than 10,000 novel splice junctions at each stage and revealed that many known genes have additional unannotated isoforms. Furthermore, we used Cufflinks to reconstruct transcripts from our RNA-seq data and found that ∼13.5% of the final contigs are derived from novel transcribed regions, both within introns and in intergenic regions. We then developed a filtering pipeline to separate protein-coding transcripts from noncoding RNAs and identified a confident set of 6686 noncoding transcripts in 3859 genomic loci. Since the current reference genome, XenTro3, consists of hundreds of scaffolds instead of full chromosomes, we also performed de novo reconstruction of the transcriptome using Trinity and uncovered hundreds of transcripts that are missing from the genome. Collectively, our data will not only aid in completing the assembly of the Xenopus tropicalis genome but will also serve as a valuable resource for gene discovery and for unraveling the fundamental mechanisms of vertebrate embryogenesis.


Asunto(s)
Regulación del Desarrollo de la Expresión Génica , Análisis de Secuencia de ARN , Transcriptoma , Xenopus/genética , Animales , Ectima Contagioso , Embrión no Mamífero/metabolismo , Intrones , Larva/genética , Larva/metabolismo , Mapeo Físico de Cromosoma , Empalme del ARN , ARN no Traducido , Alineación de Secuencia , Xenopus/crecimiento & desarrollo
12.
Proc Natl Acad Sci U S A ; 110(50): E4821-30, 2013 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-24282307

RESUMEN

Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.


Asunto(s)
Empalme Alternativo/genética , Células Madre Embrionarias/metabolismo , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Isoformas de Proteínas/genética , Transcriptoma/genética , Células Madre Embrionarias/química , Humanos , Masculino
13.
Mol Syst Biol ; 9: 632, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23295861

RESUMEN

Landmark events occur in a coordinated manner during pre-implantation development of the mammalian embryo, yet the regulatory network that orchestrates these events remains largely unknown. Here, we present the first systematic investigation of the network in pre-implantation mouse embryos using morpholino-mediated gene knockdowns of key embryonic stem cell (ESC) factors followed by detailed transcriptome analysis of pooled embryos, single embryos, and individual blastomeres. We delineated the regulons of Oct4, Sall4, and Nanog and identified a set of metabolism- and transport-related genes that were controlled by these transcription factors in embryos but not in ESCs. Strikingly, the knockdown embryos arrested at a range of developmental stages. We provided evidence that the DNA methyltransferase Dnmt3b has a role in determining the extent to which a knockdown embryo can develop. We further showed that the feed-forward loop comprising Dnmt3b, the pluripotency factors, and the miR-290-295 cluster exemplifies a network motif that buffers embryos against gene expression noise. Our findings indicate that Oct4, Sall4, and Nanog form a robust and integrated network to govern mammalian pre-implantation development.


Asunto(s)
Blastocisto/fisiología , Proteínas de Unión al ADN/genética , Células Madre Embrionarias/fisiología , Redes Reguladoras de Genes , Proteínas de Homeodominio/genética , Factor 3 de Transcripción de Unión a Octámeros/genética , Factores de Transcripción/genética , Animales , Blastocisto/metabolismo , ADN (Citosina-5-)-Metiltransferasas/genética , ADN (Citosina-5-)-Metiltransferasas/metabolismo , Proteínas de Unión al ADN/metabolismo , Técnicas de Cultivo de Embriones , Embrión de Mamíferos/metabolismo , Desarrollo Embrionario , Femenino , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Técnicas de Silenciamiento del Gen , Proteínas de Homeodominio/metabolismo , Masculino , Ratones , Ratones Endogámicos C57BL , Ratones Endogámicos DBA , MicroARNs/genética , Proteína Homeótica Nanog , Factor 3 de Transcripción de Unión a Octámeros/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Factores de Transcripción/metabolismo , ADN Metiltransferasa 3B
14.
Nat Biotechnol ; 42(4): 591-596, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37349523

RESUMEN

Current N6-methyladenosine (m6A) mapping methods need large amounts of RNA or are limited to cultured cells. Through optimized sample recovery and signal-to-noise ratio, we developed picogram-scale m6A RNA immunoprecipitation and sequencing (picoMeRIP-seq) for studying m6A in vivo in single cells and scarce cell types using standard laboratory equipment. We benchmark m6A mapping on titrations of poly(A) RNA and embryonic stem cells and in single zebrafish zygotes, mouse oocytes and embryos.


Asunto(s)
ARN , Pez Cebra , Animales , Ratones , Pez Cebra/genética , Pez Cebra/metabolismo , ARN/genética , ARN Mensajero/genética , Células Madre Embrionarias , Células Cultivadas
15.
Nat Struct Mol Biol ; 30(5): 703-709, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37081317

RESUMEN

Despite the significance of N6-methyladenosine (m6A) in gene regulation, the requirement for large amounts of RNA has hindered m6A profiling in mammalian early embryos. Here we apply low-input methyl RNA immunoprecipitation and sequencing to map m6A in mouse oocytes and preimplantation embryos. We define the landscape of m6A during the maternal-to-zygotic transition, including stage-specifically expressed transcription factors essential for cell fate determination. Both the maternally inherited transcripts to be degraded post fertilization and the zygotically activated genes during zygotic genome activation are widely marked by m6A. In contrast to m6A-marked zygotic ally-activated genes, m6A-marked maternally inherited transcripts have a higher tendency to be targeted by microRNAs. Moreover, RNAs derived from retrotransposons, such as MTA that is maternally expressed and MERVL that is transcriptionally activated at the two-cell stage, are largely marked by m6A. Our results provide a foundation for future studies exploring the regulatory roles of m6A in mammalian early embryonic development.


Asunto(s)
Regulación del Desarrollo de la Expresión Génica , MicroARNs , Animales , Ratones , Blastocisto , Oocitos/metabolismo , Desarrollo Embrionario/genética , Cigoto , MicroARNs/metabolismo , Mamíferos/genética
16.
Nucleic Acids Res ; 38(14): 4570-8, 2010 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-20371516

RESUMEN

Alternative splicing is a prevalent post-transcriptional process, which is not only important to normal cellular function but is also involved in human diseases. The newly developed second generation sequencing technique provides high-throughput data (RNA-seq data) to study alternative splicing events in different types of cells. Here, we present a computational method, SpliceMap, to detect splice junctions from RNA-seq data. This method does not depend on any existing annotation of gene structures and is capable of finding novel splice junctions with high sensitivity and specificity. It can handle long reads (50-100 nt) and can exploit paired-read information to improve mapping accuracy. Several parameters are included in the output to indicate the reliability of the predicted junction and help filter out false predictions. We applied SpliceMap to analyze 23 million paired 50-nt reads from human brain tissue. The results show at this depth of sequencing, RNA-seq can support reliable detection of splice junctions except for those that are present at very low level. Compared to current methods, SpliceMap can achieve 12% higher sensitivity without sacrificing specificity.


Asunto(s)
Empalme Alternativo , Sitios de Empalme de ARN , Análisis de Secuencia de ARN , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Humanos , Reacción en Cadena de la Polimerasa
17.
Nat Biotechnol ; 39(11): 1348-1365, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34750572

RESUMEN

Rapid advances in nanopore technologies for sequencing single long DNA and RNA molecules have led to substantial improvements in accuracy, read length and throughput. These breakthroughs have required extensive development of experimental and bioinformatics methods to fully exploit nanopore long reads for investigations of genomes, transcriptomes, epigenomes and epitranscriptomes. Nanopore sequencing is being applied in genome assembly, full-length transcript detection and base modification detection and in more specialized areas, such as rapid clinical diagnoses and outbreak surveillance. Many opportunities remain for improving data quality and analytical approaches through the development of new nanopores, base-calling methods and experimental protocols tailored to particular applications.


Asunto(s)
Secuenciación de Nanoporos , Nanoporos , Biología Computacional , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Tecnología
18.
Nat Commun ; 12(1): 1361, 2021 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-33649327

RESUMEN

Sperm contributes diverse RNAs to the zygote. While sperm small RNAs have been shown to impact offspring phenotypes, our knowledge of the sperm transcriptome, especially the composition of long RNAs, has been limited by the lack of sensitive, high-throughput experimental techniques that can distinguish intact RNAs from fragmented RNAs, known to abound in sperm. Here, we integrate single-molecule long-read sequencing with short-read sequencing to detect sperm intact RNAs (spiRNAs). We identify 3440 spiRNA species in mice and 4100 in humans. The spiRNA profile consists of both mRNAs and long non-coding RNAs, is evolutionarily conserved between mice and humans, and displays an enrichment in mRNAs encoding for ribosome. In sum, we characterize the landscape of intact long RNAs in sperm, paving the way for future studies on their biogenesis and functions. Our experimental and bioinformatics approaches can be applied to other tissues and organisms to detect intact transcripts.


Asunto(s)
Secuencia Conservada/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN/genética , Imagen Individual de Molécula , Espermatozoides/metabolismo , Animales , Evolución Molecular , Ontología de Genes , Humanos , Masculino , Ratones Endogámicos C57BL , ARN/metabolismo , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Ribosomas/metabolismo , Testículo/metabolismo , Transcriptoma/genética
19.
Genome Biol ; 21(1): 14, 2020 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-31952552

RESUMEN

The error-prone third-generation sequencing (TGS) long reads can be corrected by the high-quality second-generation sequencing (SGS) short reads, which is referred to as hybrid error correction. We here investigate the influences of the principal algorithmic factors of two major types of hybrid error correction methods by mathematical modeling and analysis on both simulated and real data. Our study reveals the distribution of accuracy gain with respect to the original long read error rate. We also demonstrate that the original error rate of 19% is the limit for perfect correction, beyond which long reads are too error-prone to be corrected by these methods.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Alineación de Secuencia , Algoritmos
20.
Genome Biol ; 20(1): 26, 2019 02 04.
Artículo en Inglés | MEDLINE | ID: mdl-30717772

RESUMEN

BACKGROUND: Third-generation sequencing technologies have advanced the progress of the biological research by generating reads that are substantially longer than second-generation sequencing technologies. However, their notorious high error rate impedes straightforward data analysis and limits their application. A handful of error correction methods for these error-prone long reads have been developed to date. The output data quality is very important for downstream analysis, whereas computing resources could limit the utility of some computing-intense tools. There is a lack of standardized assessments for these long-read error-correction methods. RESULTS: Here, we present a comparative performance assessment of ten state-of-the-art error-correction methods for long reads. We established a common set of benchmarks for performance assessment, including sensitivity, accuracy, output rate, alignment rate, output read length, run time, and memory usage, as well as the effects of error correction on two downstream applications of long reads: de novo assembly and resolving haplotype sequences. CONCLUSIONS: Taking into account all of these metrics, we provide a suggestive guideline for method choice based on available data size, computing resources, and individual research goals.


Asunto(s)
Genómica/métodos , Análisis de Secuencia de ADN , Programas Informáticos/estadística & datos numéricos , Animales , Arabidopsis , Drosophila melanogaster , Escherichia coli , Saccharomyces cerevisiae , Error Científico Experimental , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA