Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 38
Filter
1.
Cell ; 166(5): 1269-1281.e19, 2016 Aug 25.
Article in English | MEDLINE | ID: mdl-27565349

ABSTRACT

The glucocorticoid receptor (GR) binds the human genome at >10,000 sites but only regulates the expression of hundreds of genes. To determine the functional effect of each site, we measured the glucocorticoid (GC) responsive activity of nearly all GR binding sites (GBSs) captured using chromatin immunoprecipitation (ChIP) in A549 cells. 13% of GBSs assayed had GC-induced activity. The responsive sites were defined by direct GR binding via a GC response element (GRE) and exclusively increased reporter-gene expression. Meanwhile, most GBSs lacked GC-induced reporter activity. The non-responsive sites had epigenetic features of steady-state enhancers and clustered around direct GBSs. Together, our data support a model in which clusters of GBSs observed with ChIP-seq reflect interactions between direct and tethered GBSs over tens of kilobases. We further show that those interactions can synergistically modulate the activity of direct GBSs and may therefore play a major role in driving gene activation in response to GCs.


Subject(s)
Genome, Human , Glucocorticoids/metabolism , Receptors, Glucocorticoid/metabolism , Transcription Factors/metabolism , Transcriptional Activation , A549 Cells , Binding Sites/drug effects , Chromatin Immunoprecipitation , Dexamethasone/metabolism , Dexamethasone/pharmacology , Genes, Reporter , Glucocorticoids/pharmacology , Humans , Protein Binding/drug effects , Response Elements
2.
Am J Hum Genet ; 108(8): 1436-1449, 2021 08 05.
Article in English | MEDLINE | ID: mdl-34216551

ABSTRACT

Despite widespread clinical genetic testing, many individuals with suspected genetic conditions lack a precise diagnosis, limiting their opportunity to take advantage of state-of-the-art treatments. In some cases, testing reveals difficult-to-evaluate structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted. We performed targeted long-read sequencing (T-LRS) using adaptive sampling on the Oxford Nanopore platform on 40 individuals, 10 of whom lacked a complete molecular diagnosis. We computationally targeted up to 151 Mbp of sequence per individual and searched for pathogenic substitutions, structural variants, and methylation differences using a single data source. We detected all genomic aberrations-including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences-identified by prior clinical testing. In 8/8 individuals with complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, leading to changes in clinical management in one case. In ten individuals with suspected Mendelian conditions lacking a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in six and variants of uncertain significance in two others. T-LRS accurately identifies pathogenic structural variants, resolves complex rearrangements, and identifies Mendelian variants not detected by other technologies. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority genes and regions or complex clinical testing results.


Subject(s)
Chromosome Aberrations , Cytogenetic Analysis/methods , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/genetics , Genetic Predisposition to Disease , Genome, Human , Mutation , DNA Copy Number Variations , Female , Genetic Testing , High-Throughput Nucleotide Sequencing , Humans , Karyotyping , Male , Sequence Analysis, DNA
3.
Genome Res ; 31(5): 877-889, 2021 05.
Article in English | MEDLINE | ID: mdl-33722938

ABSTRACT

High-throughput reporter assays such as self-transcribing active regulatory region sequencing (STARR-seq) have made it possible to measure regulatory element activity across the entire human genome at once. The resulting data, however, present substantial analytical challenges. Here, we identify technical biases that explain most of the variance in STARR-seq data. We then develop a statistical model to correct those biases and to improve detection of regulatory elements. This approach substantially improves precision and recall over current methods, improves detection of both activating and repressive regulatory elements, and controls for false discoveries despite strong local correlations in signal.


Subject(s)
Enhancer Elements, Genetic , Genome, Human , Bias , High-Throughput Nucleotide Sequencing/methods , Humans
4.
Am J Med Genet A ; : e63880, 2024 Oct 04.
Article in English | MEDLINE | ID: mdl-39364610

ABSTRACT

Variation in the non-coding genome represents an understudied mechanism of disease and it remains challenging to predict if single nucleotide variants, small insertions and deletions, or structural variants in non-coding genomic regions will be detrimental. Our approach using complementary RNA-seq and targeted long-read DNA sequencing can prioritize identification of non-coding variants that lead to disease via alteration of gene splicing or expression. We have identified a patient with primary ciliary dyskinesia with a pathogenic coding variant on one allele of the SPAG1 gene, while the second allele appears normal by whole exome sequencing despite an autosomal recessive inheritance pattern. RNA sequencing revealed reduced SPAG1 transcript levels and exclusive allele specific expression of the known pathogenic allele, suggesting the presence of a non-coding variant on the second allele that impacts transcription. Targeted long-read DNA sequencing identified a heterozygous 3 kilobase deletion of the 5' untranslated region of SPAG1, overlapping the promoter and first non-coding exon. This non-coding deletion was missed by whole exome sequencing and gene-specific deletion/duplication analysis, highlighting the importance of investigating the non-coding genome in patients with "missing" disease-causing variation. This paradigm demonstrates the utility of both RNA and long-read DNA sequencing in identifying pathogenic non-coding variants in patients with unexplained genetic disease.

5.
Mol Ther ; 29(11): 3243-3257, 2021 11 03.
Article in English | MEDLINE | ID: mdl-34509668

ABSTRACT

Targeted gene-editing strategies have emerged as promising therapeutic approaches for the permanent treatment of inherited genetic diseases. However, precise gene correction and insertion approaches using homology-directed repair are still limited by low efficiencies. Consequently, many gene-editing strategies have focused on removal or disruption, rather than repair, of genomic DNA. In contrast, homology-independent targeted integration (HITI) has been reported to effectively insert DNA sequences at targeted genomic loci. This approach could be particularly useful for restoring full-length sequences of genes affected by a spectrum of mutations that are also too large to deliver by conventional adeno-associated virus (AAV) vectors. Here, we utilize an AAV-based, HITI-mediated approach for correction of full-length dystrophin expression in a humanized mouse model of Duchenne muscular dystrophy (DMD). We co-deliver CRISPR-Cas9 and a donor DNA sequence to insert the missing human exon 52 into its corresponding position within the DMD gene and achieve full-length dystrophin correction in skeletal and cardiac muscle. Additionally, as a proof-of-concept strategy to correct genetic mutations characterized by diverse patient mutations, we deliver a superexon donor encoding the last 28 exons of the DMD gene as a therapeutic strategy to restore full-length dystrophin in >20% of the DMD patient population. This work highlights the potential of HITI-mediated gene correction for diverse DMD mutations and advances genome editing toward realizing the promise of full-length gene restoration to treat genetic disease.


Subject(s)
CRISPR-Cas Systems , Dependovirus/genetics , Dystrophin/genetics , Exons , Gene Editing , Genetic Vectors/genetics , Muscular Dystrophy, Duchenne/genetics , Muscular Dystrophy, Duchenne/therapy , Animals , Disease Models, Animal , Gene Expression , Gene Order , Gene Transfer Techniques , Genetic Engineering , Genetic Therapy/methods , Humans , Mice , Mice, Transgenic , Muscle, Skeletal/metabolism , Mutation , Myocardium/metabolism , Virus Integration
6.
Genome Res ; 28(9): 1272-1284, 2018 09.
Article in English | MEDLINE | ID: mdl-30097539

ABSTRACT

Glucocorticoids are potent steroid hormones that regulate immunity and metabolism by activating the transcription factor (TF) activity of glucocorticoid receptor (GR). Previous models have proposed that DNA binding motifs and sites of chromatin accessibility predetermine GR binding and activity. However, there are vast excesses of both features relative to the number of GR binding sites. Thus, these features alone are unlikely to account for the specificity of GR binding and activity. To identify genomic and epigenetic contributions to GR binding specificity and the downstream changes resultant from GR binding, we performed hundreds of genome-wide measurements of TF binding, epigenetic state, and gene expression across a 12-h time course of glucocorticoid exposure. We found that glucocorticoid treatment induces GR to bind to nearly all pre-established enhancers within minutes. However, GR binds to only a small fraction of the set of accessible sites that lack enhancer marks. Once GR is bound to enhancers, a combination of enhancer motif composition and interactions between enhancers then determines the strength and persistence of GR binding, which consequently correlates with dramatic shifts in enhancer activation. Over the course of several hours, highly coordinated changes in TF binding and histone modification occupancy occur specifically within enhancers, and these changes correlate with changes in the expression of nearby genes. Following GR binding, changes in the binding of other TFs precede changes in chromatin accessibility, suggesting that other TFs are also sensitive to genomic features beyond that of accessibility.


Subject(s)
Enhancer Elements, Genetic , Histone Code , Nucleotide Motifs , Receptors, Glucocorticoid/metabolism , Transcriptional Activation , Cell Line, Tumor , Epigenesis, Genetic , Humans , Protein Binding , Transcription Factors/metabolism
7.
Bioinformatics ; 36(2): 331-338, 2020 01 15.
Article in English | MEDLINE | ID: mdl-31368479

ABSTRACT

MOTIVATION: High-throughput reporter assays dramatically improve our ability to assign function to noncoding genetic variants, by measuring allelic effects on gene expression in the controlled setting of a reporter gene. Unlike genetic association tests, such assays are not confounded by linkage disequilibrium when loci are independently assayed. These methods can thus improve the identification of causal disease mutations. While work continues on improving experimental aspects of these assays, less effort has gone into developing methods for assessing the statistical significance of assay results, particularly in the case of rare variants captured from patient DNA. RESULTS: We describe a Bayesian hierarchical model, called Bayesian Inference of Regulatory Differences, which integrates prior information and explicitly accounts for variability between experimental replicates. The model produces substantially more accurate predictions than existing methods when allele frequencies are low, which is of clear advantage in the search for disease-causing variants in DNA captured from patient cohorts. Using the model, we demonstrate a clear tradeoff between variant sequencing coverage and numbers of biological replicates, and we show that the use of additional biological replicates decreases variance in estimates of effect size, due to the properties of the Poisson-binomial distribution. We also provide a power and sample size calculator, which facilitates decision making in experimental design parameters. AVAILABILITY AND IMPLEMENTATION: The software is freely available from www.geneprediction.org/bird. The experimental design web tool can be accessed at http://67.159.92.22:8080. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Software , Alleles , Bayes Theorem , Gene Frequency , Humans , Linkage Disequilibrium
8.
Bioinformatics ; 34(21): 3616-3623, 2018 11 01.
Article in English | MEDLINE | ID: mdl-29701825

ABSTRACT

Motivation: Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed and produce functional proteins. Results: We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and non-coding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or non-coding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products and we propose that they may commonly act as cryptic factors in disease. Availability and implementation: The software is available from geneprediction.org/SGRF. Supplementary information: Supplementary information is available at Bioinformatics online.


Subject(s)
Exons , RNA Splicing , Software , Humans , Molecular Sequence Annotation , Sequence Analysis, RNA
9.
Genome Res ; 25(8): 1206-14, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26084464

ABSTRACT

We report a novel high-throughput method to empirically quantify individual-specific regulatory element activity at the population scale. The approach combines targeted DNA capture with a high-throughput reporter gene expression assay. As demonstration, we measured the activity of more than 100 putative regulatory elements from 95 individuals in a single experiment. In agreement with previous reports, we found that most genetic variants have weak effects on distal regulatory element activity. Because haplotypes are typically maintained within but not between assayed regulatory elements, the approach can be used to identify causal regulatory haplotypes that likely contribute to human phenotypes. Finally, we demonstrate the utility of the method to functionally fine map causal regulatory variants in regions of high linkage disequilibrium identified by expression quantitative trait loci (eQTL) analyses.


Subject(s)
Genetic Variation , High-Throughput Nucleotide Sequencing/methods , Regulatory Sequences, Nucleic Acid , Computational Biology/methods , Genome, Human , Haplotypes , Humans , Patient-Specific Modeling , Quantitative Trait Loci
10.
Bioinformatics ; 33(10): 1437-1446, 2017 May 15.
Article in English | MEDLINE | ID: mdl-28011790

ABSTRACT

MOTIVATION: The accurate interpretation of genetic variants is critical for characterizing genotype-phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. RESULTS: We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE ('Assessing Changes to Exons') converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. AVAILABILITY AND IMPLEMENTATION: ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE. CONTACT: myandell@genetics.utah.edu or tim.reddy@duke.edu. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.


Subject(s)
Genomics/methods , Polymorphism, Genetic , Sequence Analysis, RNA/methods , Software , Animals , Eukaryota/genetics , Exons , Haplotypes , Humans , Mutation , RNA Splicing
11.
Nat Methods ; 10(7): 630-3, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23708386

ABSTRACT

High-throughput sequencing has opened numerous possibilities for the identification of regulatory RNA-binding events. Cross-linking and immunoprecipitation of Argonaute proteins can pinpoint a microRNA (miRNA) target site within tens of bases but leaves the identity of the miRNA unresolved. A flexible computational framework, microMUMMIE, integrates sequence with cross-linking features and reliably identifies the miRNA family involved in each binding event. It considerably outperforms sequence-only approaches and quantifies the prevalence of noncanonical binding modes.


Subject(s)
Algorithms , Protein Interaction Mapping/methods , RNA-Binding Proteins/genetics , RNA/genetics , RNA/metabolism , Sequence Analysis, RNA/methods , Systems Integration
12.
Mol Ther ; 23(3): 523-32, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25492562

ABSTRACT

Duchenne muscular dystrophy (DMD) is caused by genetic mutations that result in the absence of dystrophin protein expression. Oligonucleotide-induced exon skipping can restore the dystrophin reading frame and protein production. However, this requires continuous drug administration and may not generate complete skipping of the targeted exon. In this study, we apply genome editing with zinc finger nucleases (ZFNs) to permanently remove essential splicing sequences in exon 51 of the dystrophin gene and thereby exclude exon 51 from the resulting dystrophin transcript. This approach can restore the dystrophin reading frame in ~13% of DMD patient mutations. Transfection of two ZFNs targeted to sites flanking the exon 51 splice acceptor into DMD patient myoblasts led to deletion of this genomic sequence. A clonal population was isolated with this deletion and following differentiation we confirmed loss of exon 51 from the dystrophin mRNA transcript and restoration of dystrophin protein expression. Furthermore, transplantation of corrected cells into immunodeficient mice resulted in human dystrophin expression localized to the sarcolemmal membrane. Finally, we quantified ZFN toxicity in human cells and mutagenesis at predicted off-target sites. This study demonstrates a powerful method to restore the dystrophin reading frame and protein expression by permanently deleting exons.


Subject(s)
Dystrophin/genetics , Exons , Genetic Therapy/methods , RNA Editing , RNA, Messenger/genetics , Zinc Fingers/genetics , Animals , Base Sequence , Dystrophin/biosynthesis , Dystrophin/chemistry , Electroporation , Endonucleases/genetics , Endonucleases/metabolism , Humans , Mice , Mice, Inbred NOD , Mice, SCID , Molecular Sequence Data , Muscular Dystrophy, Duchenne/genetics , Muscular Dystrophy, Duchenne/metabolism , Muscular Dystrophy, Duchenne/pathology , Muscular Dystrophy, Duchenne/therapy , Myoblasts/metabolism , Myoblasts/pathology , Open Reading Frames , Plasmids/chemistry , Plasmids/genetics , RNA Splicing , RNA, Messenger/chemistry , RNA, Messenger/metabolism , Sequence Deletion
13.
Bioinformatics ; 30(14): 1958-64, 2014 Jul 15.
Article in English | MEDLINE | ID: mdl-24659106

ABSTRACT

MOTIVATION: High-throughput sequencing of RNA in vivo facilitates many applications, not the least of which is the cataloging of variant splice isoforms of protein-coding messenger RNAs. Although many solutions have been proposed for reconstructing putative isoforms from deep sequencing data, these generally take as their substrate the collective alignment structure of RNA-seq reads and ignore the biological signals present in the actual nucleotide sequence. The majority of these solutions are graph-theoretic, relying on a splice graph representing the splicing patterns and exon expression levels indicated by the spliced-alignment process. RESULTS: We show how to augment splice graphs with additional information reflecting the biology of transcription, splicing and translation, to produce what we call an ORF (open reading frame) graph. We then show how ORF graphs can be used to produce isoform predictions with higher accuracy than current state-of-the-art approaches. AVAILABILITY AND IMPLEMENTATION: RSVP is available as C++ source code under an open-source licence: http://ohlerlab.mdc-berlin.de/software/RSVP/.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Open Reading Frames , RNA Isoforms/chemistry , Sequence Analysis, RNA/methods , Arabidopsis/genetics , Exons , Humans , RNA Isoforms/metabolism , RNA Splicing , Software
14.
Bioinformatics ; 29(13): i27-35, 2013 Jul 01.
Article in English | MEDLINE | ID: mdl-23812993

ABSTRACT

MOTIVATION: Computational approaches for the annotation of phenotypes from image data have shown promising results across many applications, and provide rich and valuable information for studying gene function and interactions. While data are often available both at high spatial resolution and across multiple time points, phenotypes are frequently annotated independently, for individual time points only. In particular, for the analysis of developmental gene expression patterns, it is biologically sensible when images across multiple time points are jointly accounted for, such that spatial and temporal dependencies are captured simultaneously. METHODS: We describe a discriminative undirected graphical model to label gene-expression time-series image data, with an efficient training and decoding method based on the junction tree algorithm. The approach is based on an effective feature selection technique, consisting of a non-parametric sparse Bayesian factor analysis model. The result is a flexible framework, which can handle large-scale data with noisy incomplete samples, i.e. it can tolerate data missing from individual time points. RESULTS: Using the annotation of gene expression patterns across stages of Drosophila embryonic development as an example, we demonstrate that our method achieves superior accuracy, gained by jointly annotating phenotype sequences, when compared with previous models that annotate each stage in isolation. The experimental results on missing data indicate that our joint learning method successfully annotates genes for which no expression data are available for one or more stages.


Subject(s)
Gene Expression Profiling/methods , Image Processing, Computer-Assisted/methods , Models, Statistical , Algorithms , Animals , Bayes Theorem , Drosophila/embryology , Drosophila/genetics , Embryonic Development/genetics , Factor Analysis, Statistical , In Situ Hybridization , RNA, Messenger/analysis , RNA, Messenger/chemistry , Statistics, Nonparametric , Vocabulary, Controlled
15.
bioRxiv ; 2024 Aug 13.
Article in English | MEDLINE | ID: mdl-39211106

ABSTRACT

Motivation: Allele-specific expression (ASE) analyses aim to detect imbalanced expression of maternal versus paternal copies of an autosomal gene. Such allelic imbalance can result from a variety of cis-acting causes, including disruptive mutations within one copy of a gene that impact the stability of transcripts, as well as regulatory variants outside the gene that impact transcription initiation. Current methods for ASE estimation suffer from a number of shortcomings, such as relying on only one variant within a gene, assuming perfect phasing information across multiple variants within a gene, or failing to account for alignment biases and possible genotyping errors. Results: We developed BEASTIE, a Bayesian hierarchical model designed for precise ASE quantification at the gene level, based on given genotypes and RNA-Seq data. BEASTIE addresses the complexities of allelic mapping bias, genotyping error, and phasing errors by incorporating empirical phasing error rates derived from Genome-in-a-Bottle individual NA12878. BEASTIE surpasses existing methods in accuracy, especially in scenarios with high phasing errors. This improvement is critical for identifying rare genetic variants often obscured by such errors. Through rigorous validation on simulated data and application to real data from the 1000 Genomes Project, we establish the robustness of BEASTIE. These findings underscore the value of BEASTIE in revealing patterns of ASE across gene sets and pathways. Availability and Implementation: The software is freely available from https://github.com/x811zou/BEASTIE . BEASTIE is available as Python source code and as a Docker image. Supplementary information: Additional information is available online.

16.
bioRxiv ; 2024 Sep 04.
Article in English | MEDLINE | ID: mdl-39282389

ABSTRACT

Recent technological developments in single-cell RNA-seq CRISPR screens enable high-throughput investigation of the genome. Through transduction of a gRNA library to a cell population followed by transcriptomic profiling by scRNA-seq, it is possible to characterize the effects of thousands of genomic perturbations on global gene expression. A major source of noise in scRNA-seq CRISPR screens are ambient gRNAs, which are contaminating gRNAs that likely originate from other cells. If not properly filtered, ambient gRNAs can result in an excess of false positive gRNA assignments. Here, we utilize CRISPR barnyard assays to characterize ambient gRNA noise in single-cell CRISPR screens. We use these datasets to develop and train CLEANSER, a mixture model that identifies and filters ambient gRNA noise. This model takes advantage of the bimodal distribution between native and ambient gRNAs and includes both gRNA and cell-specific normalization parameters, correcting for confounding technical factors that affect individual gRNAs and cells. The output of CLEANSER is the probability that a gRNA-cell assignment is in the native distribution over the ambient distribution. We find that ambient gRNA filtering methods impact differential gene expression analysis outcomes and that CLEANSER outperforms alternate approaches by increasing gRNA-cell assignment accuracy.

17.
Nature ; 450(7172): 1096-9, 2007 Dec 13.
Article in English | MEDLINE | ID: mdl-18075594

ABSTRACT

All metazoan eukaryotes express microRNAs (miRNAs), roughly 22-nucleotide regulatory RNAs that can repress the expression of messenger RNAs bearing complementary sequences. Several DNA viruses also express miRNAs in infected cells, suggesting a role in viral replication and pathogenesis. Although specific viral miRNAs have been shown to autoregulate viral mRNAs or downregulate cellular mRNAs, the function of most viral miRNAs remains unknown. Here we report that the miR-K12-11 miRNA encoded by Kaposi's-sarcoma-associated herpes virus (KSHV) shows significant homology to cellular miR-155, including the entire miRNA 'seed' region. Using a range of assays, we show that expression of physiological levels of miR-K12-11 or miR-155 results in the downregulation of an extensive set of common mRNA targets, including genes with known roles in cell growth regulation. Our findings indicate that viral miR-K12-11 functions as an orthologue of cellular miR-155 and probably evolved to exploit a pre-existing gene regulatory pathway in B cells. Moreover, the known aetiological role of miR-155 in B-cell transformation suggests that miR-K12-11 may contribute to the induction of KSHV-positive B-cell tumours in infected patients.


Subject(s)
Gene Expression Regulation , Herpesvirus 8, Human/genetics , MicroRNAs/genetics , RNA, Viral/genetics , Sequence Homology, Nucleic Acid , 3' Untranslated Regions/genetics , 3' Untranslated Regions/metabolism , B-Lymphocytes/metabolism , B-Lymphocytes/pathology , Basic-Leucine Zipper Transcription Factors/genetics , Basic-Leucine Zipper Transcription Factors/metabolism , Cell Line , Cell Transformation, Viral/genetics , Fanconi Anemia Complementation Group Proteins/genetics , Fanconi Anemia Complementation Group Proteins/metabolism , Gene Expression Profiling , Humans , MicroRNAs/metabolism , Proto-Oncogene Proteins c-fos/genetics , Proto-Oncogene Proteins c-fos/metabolism , RNA, Viral/metabolism , Substrate Specificity
18.
PLoS Comput Biol ; 6(12): e1001037, 2010 Dec 16.
Article in English | MEDLINE | ID: mdl-21187896

ABSTRACT

The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.


Subject(s)
Computational Biology/methods , Evolution, Molecular , Markov Chains , Regulatory Elements, Transcriptional/genetics , Sequence Alignment/methods , Animals , Base Sequence , Computer Simulation , Drosophila melanogaster/genetics , Gene Expression Regulation , Molecular Sequence Data , Phylogeny , ROC Curve , Sequence Analysis, DNA
19.
Bioinformatics ; 25(2): 175-82, 2009 Jan 15.
Article in English | MEDLINE | ID: mdl-19017657

ABSTRACT

MOTIVATION: The modeling of conservation patterns in genomic DNA has become increasingly popular for a number of bioinformatic applications. While several systems developed to date incorporate context-dependence in their substitution models, the impact on computational complexity and generalization ability of the resulting higher order models invites the question of whether simpler approaches to context modeling might permit appreciable reductions in model complexity and computational cost, without sacrificing prediction accuracy. RESULTS: We formulate several alternative methods for context modeling based on windowed Bayesian networks, and compare their effects on both accuracy and computational complexity for the task of discriminating functionally distinct segments in vertebrate DNA. Our results show that substantial reductions in the complexity of both the model and the associated inference algorithm can be achieved without reducing predictive accuracy.


Subject(s)
Sequence Analysis, DNA/methods , Algorithms , Bayes Theorem , Computer Simulation , DNA/chemistry , Genome , Models, Genetic , Software
20.
PLoS Biol ; 4(9): e286, 2006 Sep.
Article in English | MEDLINE | ID: mdl-16933976

ABSTRACT

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC) has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC), which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases), using diverse resources (e.g., proteases and transporters), and generating structural complexity (e.g., kinesins and dyneins). In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates), no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from other model organisms makes T. thermophila an ideal model for functional genomic studies to address biological, biomedical, and biotechnological questions of fundamental importance.


Subject(s)
Genome, Protozoan , Macronucleus/genetics , Models, Biological , Tetrahymena thermophila/genetics , Animals , Cells, Cultured , Chromosome Mapping/methods , Chromosomes , Databases, Genetic , Eukaryotic Cells/physiology , Evolution, Molecular , Micronucleus, Germline/genetics , Models, Animal , Phylogeny , Signal Transduction
SELECTION OF CITATIONS
SEARCH DETAIL