RESUMEN
BACKGROUND: Bisulfite sequencing (BS-Seq) is a fundamental technique for characterizing DNA methylation profiles. Genotype calling from bisulfite-converted BS-Seq data allows allele-specific methylation analysis and the concurrent exploration of genetic and epigenetic profiles. Despite various methods have been proposed, single nucleotide polymorphisms (SNPs) calling from BS-Seq data, particularly for SNPs on chromosome X and in the presence of contaminative data, poses ongoing challenges. RESULTS: We introduce bsgenova, a novel SNP caller tailored for bisulfite sequencing data, employing a Bayesian multinomial model. The performance of bsgenova is assessed by comparing SNPs called from real-world BS-Seq data with those from corresponding whole-genome sequencing (WGS) data across three human cell lines. bsgenova is both sensitive and precise, especially for chromosome X, compared with three existing methods. Moreover, in the presence of low-quality reads, bsgenova outperforms other methods notably. In addition, bsgenova is meticulously implemented, leveraging matrix imputation and multi-process parallelization. Compared to existing methods, bsgenova stands out for its speed and efficiency in memory and disk usage. Furthermore, bsgenova integrates bsextractor, a methylation extractor, enhancing its flexibility and expanding its utility. CONCLUSIONS: We introduce bsgenova for SNP calling from bisulfite-sequencing data. The source code is available at https://github.com/hippo-yf/bsgenova under license GPL-3.0.
Asunto(s)
Metilación de ADN , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Sulfitos , Humanos , Metilación de ADN/genética , Sulfitos/química , Análisis de Secuencia de ADN/métodos , Genotipo , Programas Informáticos , Secuenciación Completa del Genoma/métodos , Teorema de BayesRESUMEN
The molecular mechanisms controlling the transition from meiotic arrest to meiotic resumption in mammalian oocytes have not been fully elucidated. Single-cell omics technology provides a new opportunity to decipher the early molecular events of oocyte growth in mammals. Here we focused on analyzing oocytes that were collected from antral follicles in different diameters of porcine pubertal ovaries, and used single-cell M&T-seq technology to analyze the nuclear DNA methylome and cytoplasmic transcriptome in parallel for 62 oocytes. 10× Genomics single-cell transcriptomic analyses were also performed to explore the bi-directional cell-cell communications within antral follicles. A new pipeline, methyConcerto, was developed to specifically and comprehensively characterize the methylation profile and allele-specific methylation events for a single-cell methylome. We characterized the gene expressions and DNA methylations of individual oocyte in porcine antral follicle, and both active and inactive gene's bodies displayed high methylation levels, thereby enabled defining two distinct types of oocytes. Although the methylation levels of Type II were higher than that of Type I, Type II contained nearly two times more of cytoplasmic transcripts than Type I. Moreover, the imprinting methylation patterns of Type II were more dramatically divergent than Type I, and the gene expressions and DNA methylations of Type II were more similar with that of MII oocytes. The crosstalk between granulosa cells and Type II oocytes was active, and these observations revealed that Type II was more poised for maturation. We further confirmed Insulin Receptor Substrate-1 in insulin signaling pathway is a key regulator on maturation by in vitro maturation experiments. Our study provides new insights into the regulatory mechanisms between meiotic arrest and meiotic resumption in mammalian oocytes. We also provide a new analytical package for future single-cell methylomics study.
Asunto(s)
Multiómica , Oocitos , Femenino , Porcinos , Animales , Folículo Ovárico , Núcleo Celular , Ciclo Celular , MamíferosRESUMEN
BACKGROUND: Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. RESULTS: We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. CONCLUSIONS: MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.
Asunto(s)
Perfilación de la Expresión Génica , Genes Esenciales , RNA-Seq , Análisis de Secuencia de ARN , Secuenciación del ExomaRESUMEN
BACKGROUND: A key problem in systems biology is the determination of the regulatory mechanism corresponding to a phenotype. An empirical approach in this regard is to compare the expression profiles of cells under two conditions or tissues from two phenotypes and to unravel the underlying transcriptional regulation. We have proposed the method BASE to statistically infer the effective regulatory factors that are responsible for the gene expression differentiation with the help from the binding data between factors and genes. Usually the protein-DNA binding data are obtained by ChIP-seq experiments, which could be costly and are condition-specific. RESULTS: Here we report a definition of binding strength based on a probability model. Using this condition-free definition, the BASE method needs only the frequencies of cis-motifs in regulatory regions, thereby the inferences can be carried out in silico. The directional regulation can be inferred by considering down- and up-regulation separately. We showed the effectiveness of the approach by one case study. In the study of the effects of polyunsaturated fatty acids (PUFA), namely, docosahexaenoic (DHA) and eicosapentaenoic (EPA) diets on mouse small intestine cells, the inferences of regulations are consistent with those reported in the literature, including PPARα and NFκB, respectively corresponding to enhanced adipogenesis and reduced inflammation. Moreover, we discovered enhanced RORA regulation of circadian rhythm, and reduced ETS1 regulation of angiogenesis. CONCLUSIONS: With the probabilistic definition of cis-trans binding affinity, the BASE method could obtain the significances of TF regulation changes corresponding to a gene expression differentiation profile between treatment and control samples. The landscape of the inferred cis-trans regulations is helpful for revealing the underlying molecular mechanisms. Particularly we reported a more comprehensive regulation induced by EPA&DHA diet.
Asunto(s)
Inductores de la Angiogénesis/administración & dosificación , Ácidos Docosahexaenoicos/administración & dosificación , Ácido Eicosapentaenoico/administración & dosificación , Regulación de la Expresión Génica , Hiperlipidemias/genética , Motivos de Nucleótidos , Transcripción Genética , Adipogénesis/efectos de los fármacos , Animales , Hiperlipidemias/tratamiento farmacológico , Intestino Delgado/metabolismo , Ratones , Regiones Promotoras GenéticasRESUMEN
Pancreatic islet failure is a key characteristic of type 2 diabetes besides insulin resistance. To get molecular insights into the pathology of islets in type 2 diabetes, we developed a computational approach to integrating expression profiles of Goto-Kakizaki and Wistar rat islets from a designed experiment with those of the human islets from an observational study. A principal gene-eigenvector in the expression profiles characterized by up-regulated angiogenesis and down-regulated oxidative phosphorylation was identified conserved across the two species. In the case of Goto-Kakizaki versus Wistar islets, such alteration in gene expression can be verified directly by the treatment-control tests over time, and corresponds to the alteration of α/ß-cell distribution obtained by quantifying the islet micrographs. Furthermore, the correspondence between the dual sample- and gene-eigenvectors unveils more delicate structures. In the case of rats, the up- and down-trend of insulin mRNA levels before and after week 8 correspond respectively to the top two principal eigenvectors. In the case of human, the top two principal eigenvectors correspond respectively to the late and early stages of diabetes. According to the aggregated expression signature, a large portion of genes involved in the hypoxia-inducible factor signaling pathway, which activates transcription of angiogenesis, were significantly up-regulated. Furthermore, top-ranked anti-angiogenic genes THBS1 and PEDF indicate the existence of a counteractive mechanism that is in line with thickened and fragmented capillaries found in the deteriorated islets. Overall, the integrative analysis unravels the principal transcriptional alterations underlying the islet deterioration of morphology and insulin secretion along type 2 diabetes progression.