Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
1.
Nucleic Acids Res ; 50(5): 2452-2463, 2022 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-35188540

RESUMO

Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA's structure is often key to its function. RNA structures are typically characterized by compensatory (structure-preserving) basepair changes that are unexpected given the underlying sequence variation, i.e., they have evolved through negative selection on structure. We address the question of how fast the primary sequence of an RNA can change through evolution while conserving its structure. Specifically, we consider predicted and known structures in vertebrate genomes. After careful control of false discovery rates, we obtain 13 de novo structures (and three known Rfam structures) that we predict to have rapidly evolving sequences-defined as structures where the primary sequences of human and mouse have diverged at least twice as fast (1.5 times for Rfam) as nearby neutrally evolving sequences. Two of the three known structures function in translation inhibition related to infection and immune response. We conclude that rapid sequence divergence does not preclude RNA structure conservation in vertebrates, although these events are relatively rare.


Assuntos
Genoma , RNA , Animais , Evolução Molecular , Camundongos , Filogenia , RNA/química , RNA/genética , Vertebrados/genética
2.
NAR Genom Bioinform ; 3(2): lqab046, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34056596

RESUMO

The analysis of mRNA transcript abundance with RNA-Seq is a central tool in molecular biology research, but often analyses fail to account for the uncertainty in these estimates, which can be significant, especially when trying to disentangle isoforms or duplicated genes. Preserving uncertainty necessitates a full probabilistic model of the all the sequencing reads, which quickly becomes intractable, as experiments can consist of billions of reads. To overcome these limitations, we propose a new method of approximating the likelihood function of a sparse mixture model, using a technique we call the Pólya tree transformation. We demonstrate that substituting this approximation for the real thing achieves most of the benefits with a fraction of the computational costs, leading to more accurate detection of differential transcript expression and transcript coexpression.

3.
Sci Rep ; 10(1): 3490, 2020 02 26.
Artigo em Inglês | MEDLINE | ID: mdl-32103057

RESUMO

Spatial heterogeneity is a fundamental feature of the tumor microenvironment (TME), and tackling spatial heterogeneity in neoplastic metabolic aberrations is critical for tumor treatment. Genome-scale metabolic network models have been used successfully to simulate cancer metabolic networks. However, most models use bulk gene expression data of entire tumor biopsies, ignoring spatial heterogeneity in the TME. To account for spatial heterogeneity, we performed spatially-resolved metabolic network modeling of the prostate cancer microenvironment. We discovered novel malignant-cell-specific metabolic vulnerabilities targetable by small molecule compounds. We predicted that inhibiting the fatty acid desaturase SCD1 may selectively kill cancer cells based on our discovery of spatial separation of fatty acid synthesis and desaturation. We also uncovered higher prostaglandin metabolic gene expression in the tumor, relative to the surrounding tissue. Therefore, we predicted that inhibiting the prostaglandin transporter SLCO2A1 may selectively kill cancer cells. Importantly, SCD1 and SLCO2A1 have been previously shown to be potently and selectively inhibited by compounds such as CAY10566 and suramin, respectively. We also uncovered cancer-selective metabolic liabilities in central carbon, amino acid, and lipid metabolism. Our novel cancer-specific predictions provide new opportunities to develop selective drug targets for prostate cancer and other cancers where spatial transcriptomics datasets are available.


Assuntos
Redes e Vias Metabólicas/genética , Neoplasias da Próstata/patologia , Ácido Araquidônico/metabolismo , Cisteína/metabolismo , Bases de Dados Factuais , Humanos , Masculino , Transportadores de Ânions Orgânicos/antagonistas & inibidores , Transportadores de Ânions Orgânicos/metabolismo , Neoplasias da Próstata/metabolismo , Estearoil-CoA Dessaturase/antagonistas & inibidores , Estearoil-CoA Dessaturase/metabolismo , Ácido Succínico/metabolismo , Suramina/química , Suramina/metabolismo , Microambiente Tumoral
4.
Dev Cell ; 52(2): 236-250.e7, 2020 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-31991105

RESUMO

Regulation of embryonic diapause, dormancy that interrupts the tight connection between developmental stage and time, is still poorly understood. Here, we characterize the transcriptional and metabolite profiles of mouse diapause embryos and identify unique gene expression and metabolic signatures with activated lipolysis, glycolysis, and metabolic pathways regulated by AMPK. Lipolysis is increased due to mTORC2 repression, increasing fatty acids to support cell survival. We further show that starvation in pre-implantation ICM-derived mouse ESCs induces a reversible dormant state, transcriptionally mimicking the in vivo diapause stage. During starvation, Lkb1, an upstream kinase of AMPK, represses mTOR, which induces a reversible glycolytic and epigenetically H4K16Ac-negative, diapause-like state. Diapause furthermore activates expression of glutamine transporters SLC38A1/2. We show by genetic and small molecule inhibitors that glutamine transporters are essential for the H4K16Ac-negative, diapause state. These data suggest that mTORC1/2 inhibition, regulated by amino acid levels, is causal for diapause metabolism and epigenetic state.


Assuntos
Sistema A de Transporte de Aminoácidos/metabolismo , Blastocisto/metabolismo , Embrião de Mamíferos/citologia , Alvo Mecanístico do Complexo 2 de Rapamicina/metabolismo , Proteínas Serina-Treonina Quinases/metabolismo , Proteínas Quinases Ativadas por AMP , Animais , Proliferação de Células/genética , Proliferação de Células/fisiologia , Células-Tronco Embrionárias/citologia , Técnicas de Inativação de Genes , Camundongos
5.
BMC Genomics ; 19(1): 899, 2018 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-30537930

RESUMO

BACKGROUND: Comparative genomics approaches have facilitated the discovery of many novel non-coding and structured RNAs (ncRNAs). The increasing availability of related genomes now makes it possible to systematically search for compensatory base changes - and thus for conserved secondary structures - even in genomic regions that are poorly alignable in the primary sequence. The wealth of available transcriptome data can add valuable insight into expression and possible function for new ncRNA candidates. Earlier work identifying ncRNAs in Drosophila melanogaster made use of sequence-based alignments and employed a sliding window approach, inevitably biasing identification toward RNAs encoded in the more conserved parts of the genome. RESULTS: To search for conserved RNA structures (CRSs) that may not be highly conserved in sequence and to assess the expression of CRSs, we conducted a genome-wide structural alignment screen of 27 insect genomes including D. melanogaster and integrated this with an extensive set of tiling array data. The structural alignment screen revealed ∼30,000 novel candidate CRSs at an estimated false discovery rate of less than 10%. With more than one quarter of all individual CRS motifs showing sequence identities below 60%, the predicted CRSs largely complement the findings of sliding window approaches applied previously. While a sixth of the CRSs were ubiquitously expressed, we found that most were expressed in specific developmental stages or cell lines. Notably, most statistically significant enrichment of CRSs were observed in pupae, mainly in exons of untranslated regions, promotors, enhancers, and long ncRNAs. Interestingly, cell lines were found to express a different set of CRSs than were found in vivo. Only a small fraction of intergenic CRSs were co-expressed with the adjacent protein coding genes, which suggests that most intergenic CRSs are independent genetic units. CONCLUSIONS: This study provides a more comprehensive view of the ncRNA transcriptome in fly as well as evidence for differential expression of CRSs during development and in cell lines.


Assuntos
Sequência Conservada , Drosophila melanogaster/genética , RNA/química , Animais , Composição de Bases/genética , Sequência de Bases , Drosophila melanogaster/crescimento & desenvolvimento , Regulação da Expressão Gênica , Anotação de Sequência Molecular , RNA não Traduzido/genética , Software
6.
Sci Rep ; 8(1): 10492, 2018 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-30002405

RESUMO

Sexual reproduction roots the eukaryotic tree of life, although its loss occurs across diverse taxa. Asexual reproduction and clonal lineages persist in these taxa despite theoretical arguments suggesting that individual clones should be evolutionarily short-lived due to limited phenotypic diversity. Here, we present quantitative evidence that an obligate asexual lineage emerged from a sexual population of the marine diatom Thalassiosira pseudonana and rapidly expanded throughout the world's oceans. Whole genome comparisons identified two lineages with characteristics expected of sexually reproducing strains in Hardy-Weinberg equilibrium. A third lineage displays genomic signatures for the functional loss of sexual reproduction followed by a recent global colonization by a single ancestral genotype. Extant members of this lineage are genetically differentiated and phenotypically plastic, potentially allowing for rapid adaptation when they are challenged by natural selection. Such mechanisms may be expected to generate new clones within marginal populations of additional unicellular species, facilitating the exploration and colonization of novel environments, aided by exponential growth and ease of dispersal.


Assuntos
Diatomáceas/genética , Evolução Molecular , Microalgas/genética , Reprodução Assexuada/genética , Seleção Genética , Oceanos e Mares , Filogenia
7.
Cell Rep ; 20(7): 1597-1608, 2017 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-28813672

RESUMO

We analyzed chromatin dynamics and transcriptional activity of human embryonic stem cell (hESC)-derived cardiac progenitor cells (CPCs) and KDR+/CD34+ endothelial cells generated from different mesodermal origins. Using an unbiased algorithm to hierarchically rank genes modulated at the level of chromatin and transcription, we identified candidate regulators of mesodermal lineage determination. HOPX, a non-DNA-binding homeodomain protein, was identified as a candidate regulator of blood-forming endothelial cells. Using HOPX reporter and knockout hESCs, we show that HOPX regulates blood formation. Loss of HOPX does not impact endothelial fate specification but markedly reduces primitive hematopoiesis, acting at least in part through failure to suppress Wnt/ß-catenin signaling. Thus, chromatin state analysis permits identification of regulators of mesodermal specification, including a conserved role for HOPX in governing primitive hematopoiesis.


Assuntos
Cromatina/metabolismo , Hematopoese/genética , Proteínas de Homeodomínio/genética , Células-Tronco Embrionárias Humanas/metabolismo , Mesoderma/metabolismo , Proteína 1 de Leucemia Linfocítica Aguda de Células T/genética , Transcrição Gênica , Proteínas Supressoras de Tumor/genética , Algoritmos , Sistemas CRISPR-Cas , Diferenciação Celular , Linhagem da Célula/genética , Cromatina/química , Células Endoteliais/citologia , Células Endoteliais/metabolismo , Corantes Fluorescentes/química , Corantes Fluorescentes/metabolismo , Genes Reporter , Células-Tronco Embrionárias Humanas/citologia , Humanos , Mesoderma/citologia , Mesoderma/crescimento & desenvolvimento , Miócitos Cardíacos/citologia , Miócitos Cardíacos/metabolismo , Transdução de Sinais , Proteína 1 de Leucemia Linfocítica Aguda de Células T/metabolismo , Proteínas Supressoras de Tumor/deficiência , beta Catenina/genética , beta Catenina/metabolismo
8.
Sci Rep ; 7(1): 5776, 2017 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-28720872

RESUMO

Anatomical subdivisions of the human brain can be associated with different neuronal functions. This functional diversification is reflected by differences in gene expression. By analyzing post-mortem gene expression data from the Allen Brain Atlas, we investigated the impact of transcription factors (TF) and RNA secondary structures on the regulation of gene expression in the human brain. First, we modeled the expression of a gene as a linear combination of the expression of TFs. We devised an approach to select robust TF-gene interactions and to determine localized contributions to gene expression of TFs. Among the TFs with the most localized contributions, we identified EZH2 in the cerebellum, NR3C1 in the cerebral cortex and SRF in the basal forebrain. Our results suggest that EZH2 is involved in regulating ZIC2 and SHANK1 which have been linked to neurological diseases such as autism spectrum disorder. Second, we associated enriched regulatory elements inside differentially expressed mRNAs with RNA secondary structure motifs. We found a group of purine-uracil repeat RNA secondary structure motifs plus other motifs in neuron related genes such as ACSL4 and ERLIN2.


Assuntos
Encéfalo/metabolismo , Perfilação da Expressão Gênica , RNA/genética , Elementos Reguladores de Transcrição/genética , Fatores de Transcrição/genética , Algoritmos , Autopsia , Ontologia Genética , Redes Reguladoras de Genes , Humanos , Modelos Genéticos , Conformação de Ácido Nucleico , RNA/química , Fatores de Transcrição/metabolismo
9.
Genome Res ; 27(8): 1371-1383, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28487280

RESUMO

Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human-mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3' ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality.


Assuntos
Regulação da Expressão Gênica , Conformação de Ácido Nucleico , RNA/química , RNA/genética , Elementos Reguladores de Transcrição , Vertebrados/genética , Animais , Sequência de Bases , Sequência Conservada , Genoma Humano , Humanos , Camundongos , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Homologia de Sequência , Transcrição Gênica
10.
Development ; 142(18): 3198-209, 2015 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-26153229

RESUMO

During vertebrate development, mesodermal fate choices are regulated by interactions between morphogens such as activin/nodal, BMPs and Wnt/ß-catenin that define anterior-posterior patterning and specify downstream derivatives including cardiomyocyte, endothelial and hematopoietic cells. We used human embryonic stem cells to explore how these pathways control mesodermal fate choices in vitro. Varying doses of activin A and BMP4 to mimic cytokine gradient polarization in the anterior-posterior axis of the embryo led to differential activity of Wnt/ß-catenin signaling and specified distinct anterior-like (high activin/low BMP) and posterior-like (low activin/high BMP) mesodermal populations. Cardiogenic mesoderm was generated under conditions specifying anterior-like mesoderm, whereas blood-forming endothelium was generated from posterior-like mesoderm, and vessel-forming CD31(+) endothelial cells were generated from all mesoderm origins. Surprisingly, inhibition of ß-catenin signaling led to the highly efficient respecification of anterior-like endothelium into beating cardiomyocytes. Cardiac respecification was not observed in posterior-derived endothelial cells. Thus, activin/BMP gradients specify distinct mesodermal subpopulations that generate cell derivatives with unique angiogenic, hemogenic and cardiogenic properties that should be useful for understanding embryogenesis and developing therapeutics.


Assuntos
Transdiferenciação Celular/fisiologia , Endotélio/fisiologia , Mesoderma/fisiologia , Miócitos Cardíacos/fisiologia , Transdução de Sinais/fisiologia , beta Catenina/antagonistas & inibidores , Ativinas/farmacologia , Análise de Variância , Sequência de Bases , Proteína Morfogenética Óssea 4/farmacologia , Técnicas de Cultura de Células , Transdiferenciação Celular/efeitos dos fármacos , Células Cultivadas , Endotélio/citologia , Citometria de Fluxo , Imunofluorescência , Humanos , Mesoderma/citologia , Dados de Sequência Molecular , Proteômica , Reação em Cadeia da Polimerase em Tempo Real , Análise de Sequência de RNA , Transdução de Sinais/efeitos dos fármacos
11.
Proc Natl Acad Sci U S A ; 112(21): E2785-94, 2015 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-25964336

RESUMO

In metazoans, transition from fetal to adult heart is accompanied by a switch in energy metabolism-glycolysis to fatty acid oxidation. The molecular factors regulating this metabolic switch remain largely unexplored. We first demonstrate that the molecular signatures in 1-year (y) matured human embryonic stem cell-derived cardiomyocytes (hESC-CMs) are similar to those seen in in vivo-derived mature cardiac tissues, thus making them an excellent model to study human cardiac maturation. We further show that let-7 is the most highly up-regulated microRNA (miRNA) family during in vitro human cardiac maturation. Gain- and loss-of-function analyses of let-7g in hESC-CMs demonstrate it is both required and sufficient for maturation, but not for early differentiation of CMs. Overexpression of let-7 family members in hESC-CMs enhances cell size, sarcomere length, force of contraction, and respiratory capacity. Interestingly, large-scale expression data, target analysis, and metabolic flux assays suggest this let-7-driven CM maturation could be a result of down-regulation of the phosphoinositide 3 kinase (PI3K)/AKT protein kinase/insulin pathway and an up-regulation of fatty acid metabolism. These results indicate let-7 is an important mediator in augmenting metabolic energetics in maturing CMs. Promoting maturation of hESC-CMs with let-7 overexpression will be highly significant for basic and applied research.


Assuntos
MicroRNAs/genética , MicroRNAs/metabolismo , Miócitos Cardíacos/citologia , Miócitos Cardíacos/metabolismo , Adulto , Diferenciação Celular/genética , Linhagem Celular , Células-Tronco Embrionárias/citologia , Células-Tronco Embrionárias/metabolismo , Metabolismo Energético , Regulação da Expressão Gênica no Desenvolvimento , Humanos , Modelos Cardiovasculares , Contração Miocárdica , Miócitos Cardíacos/fisiologia , Transdução de Sinais , Engenharia Tecidual , Regulação para Cima
12.
J Med Primatol ; 43(5): 317-28, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24810475

RESUMO

BACKGROUND: The genome annotations of rhesus (Macaca mulatta) and cynomolgus (Macaca fascicularis) macaques, two of the most common non-human primate animal models, are limited. METHODS: We analyzed large-scale macaque RNA-based next-generation sequencing (RNAseq) data to identify un-annotated macaque transcripts. RESULTS: For both macaque species, we uncovered thousands of novel isoforms for annotated genes and thousands of un-annotated intergenic transcripts enriched with non-coding RNAs. We also identified thousands of transcript sequences which are partially or completely 'missing' from current macaque genome assemblies. We showed that many newly identified transcripts were differentially expressed during SIV infection of rhesus macaques or during Ebola virus infection of cynomolgus macaques. CONCLUSIONS: For two important macaque species, we uncovered thousands of novel isoforms and un-annotated intergenic transcripts including coding and non-coding RNAs, polyadenylated and non-polyadenylated transcripts. This resource will greatly improve future macaque studies, as demonstrated by their applications in infectious disease studies.


Assuntos
Doença pelo Vírus Ebola/genética , Macaca fascicularis , Macaca mulatta , Doenças dos Macacos/genética , Síndrome de Imunodeficiência Adquirida dos Símios/genética , Transcriptoma , Animais , Ebolavirus/fisiologia , Doença pelo Vírus Ebola/virologia , Sequenciamento de Nucleotídeos em Larga Escala , Índia , Maurício , Dados de Sequência Molecular , Doenças dos Macacos/virologia , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Análise de Sequência de RNA , Síndrome de Imunodeficiência Adquirida dos Símios/virologia , Vírus da Imunodeficiência Símia/fisiologia
13.
Methods Mol Biol ; 1097: 1-31, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24639152

RESUMO

RNA bioinformatics and computational RNA biology have emerged from implementing methods for predicting the secondary structure of single sequences. The field has evolved to exploit multiple sequences to take evolutionary information into account, such as compensating (and structure preserving) base changes. These methods have been developed further and applied for computational screens of genomic sequence. Furthermore, a number of additional directions have emerged. These include methods to search for RNA 3D structure, RNA-RNA interactions, and design of interfering RNAs (RNAi) as well as methods for interactions between RNA and proteins.Here, we introduce the basic concepts of predicting RNA secondary structure relevant to the further analyses of RNA sequences. We also provide pointers to methods addressing various aspects of RNA bioinformatics and computational RNA biology.


Assuntos
Biologia Computacional/métodos , RNA/química , RNA/genética , Algoritmos , Conformação de Ácido Nucleico , Dobramento de RNA
14.
Methods Mol Biol ; 1097: 303-18, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24639166

RESUMO

De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Conformação de Ácido Nucleico , Motivos de Nucleotídeos , RNA não Traduzido/química , Análise de Sequência de RNA/métodos
15.
Bioinformatics ; 30(6): 775-83, 2014 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-24162561

RESUMO

MOTIVATION: High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. RESULTS: We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. AVAILABILITY: The motifRG package is publically available via the bioconductor repository. CONTACT: yzizhen@fhcrc.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Imunoprecipitação da Cromatina/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , DNA/genética , Humanos , Fatores de Transcrição/genética
16.
Metagenomics (Cairo) ; 2: 235646, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24013439

RESUMO

Study of the human microbiota in relation to human health and disease is a rapidly expanding field. To fully understand the complex relationship between the human gut microbiota and disease risks, study designs that capture the variation within and between human subjects at the population level are required, but this has been hampered by the lack of cost-effective methods to characterize this variation. Illumina sequencing is inexpensive and produces millions of reads per run, but it is unclear whether short reads can adequately represent the microbial community of a human host. In this study, we examined the utility of a profiling method, microbial nucleotide signatures (MNS), focused on low-depth sampling of the human microbiota using Ilumina short reads. This method is intended to aid in human population-based studies where large sample sizes are required to adequately capture variation in disease or phenotype differences. We found that, by calculating the nucleotide diversities along the sequenced 16S rRNA gene region, which did not require assembly or phylogenetic identification, we were able to differentiate the gut microbial nucleotide signatures of 9 healthy individuals. When we further subsampled the reads down to 40,000 reads (51 bp long) per sample, the diversity profiles were relatively unchanged. Applying MNS to a public datasets showed that it could differentiate body site differences. The scalability of our approach offers rapid classification of study participants for studies with the sample sizes required for epidemiological studies. Using MNS to classify the microbiome associated with a disease state followed by targeted in-depth sequencing will give a comprehensive understanding of the role of the microbiome in human health.

17.
Skelet Muscle ; 3(1): 8, 2013 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-23566431

RESUMO

BACKGROUND: Transcription factor overexpression is common in biological experiments and transcription factor amplification is associated with many cancers, yet few studies have directly compared the DNA-binding profiles of endogenous versus overexpressed transcription factors. METHODS: We analyzed MyoD ChIP-seq data from C2C12 mouse myotubes, primary mouse myotubes, and mouse fibroblasts differentiated into muscle cells by overexpression of MyoD and compared the genome-wide binding profiles and binding site characteristics of endogenous and overexpressed MyoD. RESULTS: Overexpressed MyoD bound to the same sites occupied by endogenous MyoD and possessed the same E-box sequence preference and co-factor site enrichments, and did not bind to new sites with distinct characteristics. CONCLUSIONS: Our data demonstrate a robust fidelity of transcription factor binding sites over a range of expression levels and that increased amounts of transcription factor increase the binding at physiologically bound sites.

18.
Nucleic Acids Res ; 40(22): e171, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22904078

RESUMO

UNLABELLED: We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. AVAILABILITY: Quip is freely available under the 3-clause BSD license from http://cs.washington.edu/homes/dcjones/quip.


Assuntos
Algoritmos , Compressão de Dados/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Probabilidade , Software
19.
J Comput Biol ; 19(9): 989-97, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22897152

RESUMO

We analyzed 198 datasets of chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) and developed a methodology for identification of high-confidence enhancer and promoter regions from transcription factor ChIP-seq data alone. We identify 32,467 genomic regions marked with ChIP-seq binding peaks in 15 or more experiments as high-confidence cis-regulatory regions. Although the selected regions mark only ~0.67% of the genome, 70.5% of our predicted binding regions fall within independently identified, strongly expression-correlated and histone-marked enhancer regions, which cover ~8% of the genome (Ernst et al., Nature 2011 , 473, 43-49). Even more remarkably, 85.6% of our selected regions overlap transcription factor (TF) binding regions identified in evolutionarily conserved DNase1 hypersensitivity cluster regions, which cover 0.75% of the genome (Boyle et al., Genome Research 2011 , 21, 456-464). P-values for these overlaps are effectively zero (Z-scores of 328 and 715 respectively). Furthermore, 62% of our selected regions overlap the intersection of the evolutionarily conserved DNase1 hypersensitivity-identified TF-binding regions of Boyle et al. (2011) with the histone-marked enhancers found to be strongly associated with transcriptional activity by Ernst et al. (2011). Two hundred thirty of our candidate cis-regulatory regions overlap cancer-associated variants reported in the Catalogue of Somatic Mutations in Cancer ( http://www.sanger.ac.uk/genetics/CGP/cosmic/ ). We also identify 1,252 potential proximal promoters for the 7,561 disjoint lincRNA regions currently in the Human lincRNA Catalog (www.broadinstitute.org/genome_bio/human_lincrnas/). Our investigation used approximately half of all currently available ENCODE ChIP-seq datasets, suggesting further gains are likely from analysis of all datasets currently available.


Assuntos
Imunoprecipitação da Cromatina/métodos , Elementos Facilitadores Genéticos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo , Sítios de Ligação , Genoma Humano , Genômica/métodos , Humanos , Ligação Proteica , Fatores de Transcrição/genética
20.
BMC Genomics ; 13: 214, 2012 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-22651826

RESUMO

BACKGROUND: Post-transcriptional control of gene expression is mostly conducted by specific elements in untranslated regions (UTRs) of mRNAs, in collaboration with specific binding proteins and RNAs. In several well characterized cases, these RNA elements are known to form stable secondary structures. RNA secondary structures also may have major functional implications for long noncoding RNAs (lncRNAs). Recent transcriptional data has indicated the importance of lncRNAs in brain development and function. However, no methodical efforts to investigate this have been undertaken. Here, we aim to systematically analyze the potential for RNA structure in brain-expressed transcripts. RESULTS: By comprehensive spatial expression analysis of the adult mouse in situ hybridization data of the Allen Mouse Brain Atlas, we show that transcripts (coding as well as non-coding) associated with in silico predicted structured probes are highly and significantly enriched in almost all analyzed brain regions. Functional implications of these RNA structures and their role in the brain are discussed in detail along with specific examples. We observe that mRNAs with a structure prediction in their UTRs are enriched for binding, transport and localization gene ontology categories. In addition, after manual examination we observe agreement between RNA binding protein interaction sites near the 3' UTR structures and correlated expression patterns. CONCLUSIONS: Our results show a potential use for RNA structures in expressed coding as well as noncoding transcripts in the adult mouse brain, and describe the role of structured RNAs in the context of intracellular signaling pathways and regulatory networks. Based on this data we hypothesize that RNA structure is widely involved in transcriptional and translational regulatory mechanisms in the brain and ultimately plays a role in brain function.


Assuntos
Encéfalo/metabolismo , Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , RNA/genética , Anatomia Artística , Animais , Atlas como Assunto , Análise por Conglomerados , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Redes Reguladoras de Genes/genética , Variação Genética , Hibridização In Situ , Camundongos , Anotação de Sequência Molecular , Ligação Proteica/genética , Sondas RNA/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Termodinâmica , Regiões não Traduzidas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...