Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
Cell ; 166(5): 1269-1281.e19, 2016 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-27565349

RESUMO

The glucocorticoid receptor (GR) binds the human genome at >10,000 sites but only regulates the expression of hundreds of genes. To determine the functional effect of each site, we measured the glucocorticoid (GC) responsive activity of nearly all GR binding sites (GBSs) captured using chromatin immunoprecipitation (ChIP) in A549 cells. 13% of GBSs assayed had GC-induced activity. The responsive sites were defined by direct GR binding via a GC response element (GRE) and exclusively increased reporter-gene expression. Meanwhile, most GBSs lacked GC-induced reporter activity. The non-responsive sites had epigenetic features of steady-state enhancers and clustered around direct GBSs. Together, our data support a model in which clusters of GBSs observed with ChIP-seq reflect interactions between direct and tethered GBSs over tens of kilobases. We further show that those interactions can synergistically modulate the activity of direct GBSs and may therefore play a major role in driving gene activation in response to GCs.


Assuntos
Genoma Humano , Glucocorticoides/metabolismo , Receptores de Glucocorticoides/metabolismo , Fatores de Transcrição/metabolismo , Ativação Transcricional , Células A549 , Sítios de Ligação/efeitos dos fármacos , Imunoprecipitação da Cromatina , Dexametasona/metabolismo , Dexametasona/farmacologia , Genes Reporter , Glucocorticoides/farmacologia , Humanos , Ligação Proteica/efeitos dos fármacos , Elementos de Resposta
2.
Am J Hum Genet ; 108(8): 1436-1449, 2021 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-34216551

RESUMO

Despite widespread clinical genetic testing, many individuals with suspected genetic conditions lack a precise diagnosis, limiting their opportunity to take advantage of state-of-the-art treatments. In some cases, testing reveals difficult-to-evaluate structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted. We performed targeted long-read sequencing (T-LRS) using adaptive sampling on the Oxford Nanopore platform on 40 individuals, 10 of whom lacked a complete molecular diagnosis. We computationally targeted up to 151 Mbp of sequence per individual and searched for pathogenic substitutions, structural variants, and methylation differences using a single data source. We detected all genomic aberrations-including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences-identified by prior clinical testing. In 8/8 individuals with complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, leading to changes in clinical management in one case. In ten individuals with suspected Mendelian conditions lacking a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in six and variants of uncertain significance in two others. T-LRS accurately identifies pathogenic structural variants, resolves complex rearrangements, and identifies Mendelian variants not detected by other technologies. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority genes and regions or complex clinical testing results.


Assuntos
Aberrações Cromossômicas , Análise Citogenética/métodos , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Genoma Humano , Mutação , Variações do Número de Cópias de DNA , Feminino , Testes Genéticos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cariotipagem , Masculino , Análise de Sequência de DNA
3.
Genome Res ; 31(5): 877-889, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33722938

RESUMO

High-throughput reporter assays such as self-transcribing active regulatory region sequencing (STARR-seq) have made it possible to measure regulatory element activity across the entire human genome at once. The resulting data, however, present substantial analytical challenges. Here, we identify technical biases that explain most of the variance in STARR-seq data. We then develop a statistical model to correct those biases and to improve detection of regulatory elements. This approach substantially improves precision and recall over current methods, improves detection of both activating and repressive regulatory elements, and controls for false discoveries despite strong local correlations in signal.


Assuntos
Elementos Facilitadores Genéticos , Genoma Humano , Viés , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
4.
Mol Ther ; 29(11): 3243-3257, 2021 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-34509668

RESUMO

Targeted gene-editing strategies have emerged as promising therapeutic approaches for the permanent treatment of inherited genetic diseases. However, precise gene correction and insertion approaches using homology-directed repair are still limited by low efficiencies. Consequently, many gene-editing strategies have focused on removal or disruption, rather than repair, of genomic DNA. In contrast, homology-independent targeted integration (HITI) has been reported to effectively insert DNA sequences at targeted genomic loci. This approach could be particularly useful for restoring full-length sequences of genes affected by a spectrum of mutations that are also too large to deliver by conventional adeno-associated virus (AAV) vectors. Here, we utilize an AAV-based, HITI-mediated approach for correction of full-length dystrophin expression in a humanized mouse model of Duchenne muscular dystrophy (DMD). We co-deliver CRISPR-Cas9 and a donor DNA sequence to insert the missing human exon 52 into its corresponding position within the DMD gene and achieve full-length dystrophin correction in skeletal and cardiac muscle. Additionally, as a proof-of-concept strategy to correct genetic mutations characterized by diverse patient mutations, we deliver a superexon donor encoding the last 28 exons of the DMD gene as a therapeutic strategy to restore full-length dystrophin in >20% of the DMD patient population. This work highlights the potential of HITI-mediated gene correction for diverse DMD mutations and advances genome editing toward realizing the promise of full-length gene restoration to treat genetic disease.


Assuntos
Sistemas CRISPR-Cas , Dependovirus/genética , Distrofina/genética , Éxons , Edição de Genes , Vetores Genéticos/genética , Distrofia Muscular de Duchenne/genética , Distrofia Muscular de Duchenne/terapia , Animais , Modelos Animais de Doenças , Expressão Gênica , Ordem dos Genes , Técnicas de Transferência de Genes , Engenharia Genética , Terapia Genética/métodos , Humanos , Camundongos , Camundongos Transgênicos , Músculo Esquelético/metabolismo , Mutação , Miocárdio/metabolismo , Integração Viral
5.
Genome Res ; 28(9): 1272-1284, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30097539

RESUMO

Glucocorticoids are potent steroid hormones that regulate immunity and metabolism by activating the transcription factor (TF) activity of glucocorticoid receptor (GR). Previous models have proposed that DNA binding motifs and sites of chromatin accessibility predetermine GR binding and activity. However, there are vast excesses of both features relative to the number of GR binding sites. Thus, these features alone are unlikely to account for the specificity of GR binding and activity. To identify genomic and epigenetic contributions to GR binding specificity and the downstream changes resultant from GR binding, we performed hundreds of genome-wide measurements of TF binding, epigenetic state, and gene expression across a 12-h time course of glucocorticoid exposure. We found that glucocorticoid treatment induces GR to bind to nearly all pre-established enhancers within minutes. However, GR binds to only a small fraction of the set of accessible sites that lack enhancer marks. Once GR is bound to enhancers, a combination of enhancer motif composition and interactions between enhancers then determines the strength and persistence of GR binding, which consequently correlates with dramatic shifts in enhancer activation. Over the course of several hours, highly coordinated changes in TF binding and histone modification occupancy occur specifically within enhancers, and these changes correlate with changes in the expression of nearby genes. Following GR binding, changes in the binding of other TFs precede changes in chromatin accessibility, suggesting that other TFs are also sensitive to genomic features beyond that of accessibility.


Assuntos
Elementos Facilitadores Genéticos , Código das Histonas , Motivos de Nucleotídeos , Receptores de Glucocorticoides/metabolismo , Ativação Transcricional , Linhagem Celular Tumoral , Epigênese Genética , Humanos , Ligação Proteica , Fatores de Transcrição/metabolismo
6.
Bioinformatics ; 36(2): 331-338, 2020 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-31368479

RESUMO

MOTIVATION: High-throughput reporter assays dramatically improve our ability to assign function to noncoding genetic variants, by measuring allelic effects on gene expression in the controlled setting of a reporter gene. Unlike genetic association tests, such assays are not confounded by linkage disequilibrium when loci are independently assayed. These methods can thus improve the identification of causal disease mutations. While work continues on improving experimental aspects of these assays, less effort has gone into developing methods for assessing the statistical significance of assay results, particularly in the case of rare variants captured from patient DNA. RESULTS: We describe a Bayesian hierarchical model, called Bayesian Inference of Regulatory Differences, which integrates prior information and explicitly accounts for variability between experimental replicates. The model produces substantially more accurate predictions than existing methods when allele frequencies are low, which is of clear advantage in the search for disease-causing variants in DNA captured from patient cohorts. Using the model, we demonstrate a clear tradeoff between variant sequencing coverage and numbers of biological replicates, and we show that the use of additional biological replicates decreases variance in estimates of effect size, due to the properties of the Poisson-binomial distribution. We also provide a power and sample size calculator, which facilitates decision making in experimental design parameters. AVAILABILITY AND IMPLEMENTATION: The software is freely available from www.geneprediction.org/bird. The experimental design web tool can be accessed at http://67.159.92.22:8080. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Alelos , Teorema de Bayes , Frequência do Gene , Humanos , Desequilíbrio de Ligação
7.
Bioinformatics ; 34(21): 3616-3623, 2018 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-29701825

RESUMO

Motivation: Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed and produce functional proteins. Results: We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and non-coding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or non-coding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products and we propose that they may commonly act as cryptic factors in disease. Availability and implementation: The software is available from geneprediction.org/SGRF. Supplementary information: Supplementary information is available at Bioinformatics online.


Assuntos
Éxons , Splicing de RNA , Software , Humanos , Anotação de Sequência Molecular , Análise de Sequência de RNA
8.
Genome Res ; 25(8): 1206-14, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26084464

RESUMO

We report a novel high-throughput method to empirically quantify individual-specific regulatory element activity at the population scale. The approach combines targeted DNA capture with a high-throughput reporter gene expression assay. As demonstration, we measured the activity of more than 100 putative regulatory elements from 95 individuals in a single experiment. In agreement with previous reports, we found that most genetic variants have weak effects on distal regulatory element activity. Because haplotypes are typically maintained within but not between assayed regulatory elements, the approach can be used to identify causal regulatory haplotypes that likely contribute to human phenotypes. Finally, we demonstrate the utility of the method to functionally fine map causal regulatory variants in regions of high linkage disequilibrium identified by expression quantitative trait loci (eQTL) analyses.


Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequências Reguladoras de Ácido Nucleico , Biologia Computacional/métodos , Genoma Humano , Haplótipos , Humanos , Modelagem Computacional Específica para o Paciente , Locos de Características Quantitativas
9.
Bioinformatics ; 33(10): 1437-1446, 2017 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-28011790

RESUMO

MOTIVATION: The accurate interpretation of genetic variants is critical for characterizing genotype-phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. RESULTS: We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE ('Assessing Changes to Exons') converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. AVAILABILITY AND IMPLEMENTATION: ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE. CONTACT: myandell@genetics.utah.edu or tim.reddy@duke.edu. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.


Assuntos
Genômica/métodos , Polimorfismo Genético , Análise de Sequência de RNA/métodos , Software , Animais , Eucariotos/genética , Éxons , Haplótipos , Humanos , Mutação , Splicing de RNA
10.
Nat Methods ; 10(7): 630-3, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23708386

RESUMO

High-throughput sequencing has opened numerous possibilities for the identification of regulatory RNA-binding events. Cross-linking and immunoprecipitation of Argonaute proteins can pinpoint a microRNA (miRNA) target site within tens of bases but leaves the identity of the miRNA unresolved. A flexible computational framework, microMUMMIE, integrates sequence with cross-linking features and reliably identifies the miRNA family involved in each binding event. It considerably outperforms sequence-only approaches and quantifies the prevalence of noncanonical binding modes.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas/métodos , Proteínas de Ligação a RNA/genética , RNA/genética , RNA/metabolismo , Análise de Sequência de RNA/métodos , Integração de Sistemas
11.
Mol Ther ; 23(3): 523-32, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25492562

RESUMO

Duchenne muscular dystrophy (DMD) is caused by genetic mutations that result in the absence of dystrophin protein expression. Oligonucleotide-induced exon skipping can restore the dystrophin reading frame and protein production. However, this requires continuous drug administration and may not generate complete skipping of the targeted exon. In this study, we apply genome editing with zinc finger nucleases (ZFNs) to permanently remove essential splicing sequences in exon 51 of the dystrophin gene and thereby exclude exon 51 from the resulting dystrophin transcript. This approach can restore the dystrophin reading frame in ~13% of DMD patient mutations. Transfection of two ZFNs targeted to sites flanking the exon 51 splice acceptor into DMD patient myoblasts led to deletion of this genomic sequence. A clonal population was isolated with this deletion and following differentiation we confirmed loss of exon 51 from the dystrophin mRNA transcript and restoration of dystrophin protein expression. Furthermore, transplantation of corrected cells into immunodeficient mice resulted in human dystrophin expression localized to the sarcolemmal membrane. Finally, we quantified ZFN toxicity in human cells and mutagenesis at predicted off-target sites. This study demonstrates a powerful method to restore the dystrophin reading frame and protein expression by permanently deleting exons.


Assuntos
Distrofina/genética , Éxons , Terapia Genética/métodos , Edição de RNA , RNA Mensageiro/genética , Dedos de Zinco/genética , Animais , Sequência de Bases , Distrofina/biossíntese , Distrofina/química , Eletroporação , Endonucleases/genética , Endonucleases/metabolismo , Humanos , Camundongos , Camundongos Endogâmicos NOD , Camundongos SCID , Dados de Sequência Molecular , Distrofia Muscular de Duchenne/genética , Distrofia Muscular de Duchenne/metabolismo , Distrofia Muscular de Duchenne/patologia , Distrofia Muscular de Duchenne/terapia , Mioblastos/metabolismo , Mioblastos/patologia , Fases de Leitura Aberta , Plasmídeos/química , Plasmídeos/genética , Splicing de RNA , RNA Mensageiro/química , RNA Mensageiro/metabolismo , Deleção de Sequência
12.
Bioinformatics ; 30(14): 1958-64, 2014 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-24659106

RESUMO

MOTIVATION: High-throughput sequencing of RNA in vivo facilitates many applications, not the least of which is the cataloging of variant splice isoforms of protein-coding messenger RNAs. Although many solutions have been proposed for reconstructing putative isoforms from deep sequencing data, these generally take as their substrate the collective alignment structure of RNA-seq reads and ignore the biological signals present in the actual nucleotide sequence. The majority of these solutions are graph-theoretic, relying on a splice graph representing the splicing patterns and exon expression levels indicated by the spliced-alignment process. RESULTS: We show how to augment splice graphs with additional information reflecting the biology of transcription, splicing and translation, to produce what we call an ORF (open reading frame) graph. We then show how ORF graphs can be used to produce isoform predictions with higher accuracy than current state-of-the-art approaches. AVAILABILITY AND IMPLEMENTATION: RSVP is available as C++ source code under an open-source licence: http://ohlerlab.mdc-berlin.de/software/RSVP/.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Fases de Leitura Aberta , Isoformas de RNA/química , Análise de Sequência de RNA/métodos , Arabidopsis/genética , Éxons , Humanos , Isoformas de RNA/metabolismo , Splicing de RNA , Software
13.
Bioinformatics ; 29(13): i27-35, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23812993

RESUMO

MOTIVATION: Computational approaches for the annotation of phenotypes from image data have shown promising results across many applications, and provide rich and valuable information for studying gene function and interactions. While data are often available both at high spatial resolution and across multiple time points, phenotypes are frequently annotated independently, for individual time points only. In particular, for the analysis of developmental gene expression patterns, it is biologically sensible when images across multiple time points are jointly accounted for, such that spatial and temporal dependencies are captured simultaneously. METHODS: We describe a discriminative undirected graphical model to label gene-expression time-series image data, with an efficient training and decoding method based on the junction tree algorithm. The approach is based on an effective feature selection technique, consisting of a non-parametric sparse Bayesian factor analysis model. The result is a flexible framework, which can handle large-scale data with noisy incomplete samples, i.e. it can tolerate data missing from individual time points. RESULTS: Using the annotation of gene expression patterns across stages of Drosophila embryonic development as an example, we demonstrate that our method achieves superior accuracy, gained by jointly annotating phenotype sequences, when compared with previous models that annotate each stage in isolation. The experimental results on missing data indicate that our joint learning method successfully annotates genes for which no expression data are available for one or more stages.


Assuntos
Perfilação da Expressão Gênica/métodos , Processamento de Imagem Assistida por Computador/métodos , Modelos Estatísticos , Algoritmos , Animais , Teorema de Bayes , Drosophila/embriologia , Drosophila/genética , Desenvolvimento Embrionário/genética , Análise Fatorial , Hibridização In Situ , RNA Mensageiro/análise , RNA Mensageiro/química , Estatísticas não Paramétricas , Vocabulário Controlado
14.
Nature ; 450(7172): 1096-9, 2007 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-18075594

RESUMO

All metazoan eukaryotes express microRNAs (miRNAs), roughly 22-nucleotide regulatory RNAs that can repress the expression of messenger RNAs bearing complementary sequences. Several DNA viruses also express miRNAs in infected cells, suggesting a role in viral replication and pathogenesis. Although specific viral miRNAs have been shown to autoregulate viral mRNAs or downregulate cellular mRNAs, the function of most viral miRNAs remains unknown. Here we report that the miR-K12-11 miRNA encoded by Kaposi's-sarcoma-associated herpes virus (KSHV) shows significant homology to cellular miR-155, including the entire miRNA 'seed' region. Using a range of assays, we show that expression of physiological levels of miR-K12-11 or miR-155 results in the downregulation of an extensive set of common mRNA targets, including genes with known roles in cell growth regulation. Our findings indicate that viral miR-K12-11 functions as an orthologue of cellular miR-155 and probably evolved to exploit a pre-existing gene regulatory pathway in B cells. Moreover, the known aetiological role of miR-155 in B-cell transformation suggests that miR-K12-11 may contribute to the induction of KSHV-positive B-cell tumours in infected patients.


Assuntos
Regulação da Expressão Gênica , Herpesvirus Humano 8/genética , MicroRNAs/genética , RNA Viral/genética , Homologia de Sequência do Ácido Nucleico , Regiões 3' não Traduzidas/genética , Regiões 3' não Traduzidas/metabolismo , Linfócitos B/metabolismo , Linfócitos B/patologia , Fatores de Transcrição de Zíper de Leucina Básica/genética , Fatores de Transcrição de Zíper de Leucina Básica/metabolismo , Linhagem Celular , Transformação Celular Viral/genética , Proteínas de Grupos de Complementação da Anemia de Fanconi/genética , Proteínas de Grupos de Complementação da Anemia de Fanconi/metabolismo , Perfilação da Expressão Gênica , Humanos , MicroRNAs/metabolismo , Proteínas Proto-Oncogênicas c-fos/genética , Proteínas Proto-Oncogênicas c-fos/metabolismo , RNA Viral/metabolismo , Especificidade por Substrato
15.
PLoS Comput Biol ; 6(12): e1001037, 2010 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-21187896

RESUMO

The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Cadeias de Markov , Elementos Reguladores de Transcrição/genética , Alinhamento de Sequência/métodos , Animais , Sequência de Bases , Simulação por Computador , Drosophila melanogaster/genética , Regulação da Expressão Gênica , Dados de Sequência Molecular , Filogenia , Curva ROC , Análise de Sequência de DNA
16.
Bioinformatics ; 25(2): 175-82, 2009 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-19017657

RESUMO

MOTIVATION: The modeling of conservation patterns in genomic DNA has become increasingly popular for a number of bioinformatic applications. While several systems developed to date incorporate context-dependence in their substitution models, the impact on computational complexity and generalization ability of the resulting higher order models invites the question of whether simpler approaches to context modeling might permit appreciable reductions in model complexity and computational cost, without sacrificing prediction accuracy. RESULTS: We formulate several alternative methods for context modeling based on windowed Bayesian networks, and compare their effects on both accuracy and computational complexity for the task of discriminating functionally distinct segments in vertebrate DNA. Our results show that substantial reductions in the complexity of both the model and the associated inference algorithm can be achieved without reducing predictive accuracy.


Assuntos
Análise de Sequência de DNA/métodos , Algoritmos , Teorema de Bayes , Simulação por Computador , DNA/química , Genoma , Modelos Genéticos , Software
17.
PLoS Biol ; 4(9): e286, 2006 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-16933976

RESUMO

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC) has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC), which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases), using diverse resources (e.g., proteases and transporters), and generating structural complexity (e.g., kinesins and dyneins). In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates), no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from other model organisms makes T. thermophila an ideal model for functional genomic studies to address biological, biomedical, and biotechnological questions of fundamental importance.


Assuntos
Genoma de Protozoário , Macronúcleo/genética , Modelos Biológicos , Tetrahymena thermophila/genética , Animais , Células Cultivadas , Mapeamento Cromossômico/métodos , Cromossomos , Bases de Dados Genéticas , Células Eucarióticas/fisiologia , Evolução Molecular , Micronúcleo Germinativo/genética , Modelos Animais , Filogenia , Transdução de Sinais
18.
Genome Biol Evol ; 11(10): 3035-3053, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-31599933

RESUMO

Changes in transcriptional regulation are thought to be a major contributor to the evolution of phenotypic traits, but the contribution of changes in chromatin accessibility to the evolution of gene expression remains almost entirely unknown. To address this important gap in knowledge, we developed a new method to identify DNase I Hypersensitive (DHS) sites with differential chromatin accessibility between species using a joint modeling approach. Our method overcomes several limitations inherent to conventional threshold-based pairwise comparisons that become increasingly apparent as the number of species analyzed rises. Our approach employs a single quantitative test which is more sensitive than existing pairwise methods. To illustrate, we applied our joint approach to DHS sites in fibroblast cells from five primates (human, chimpanzee, gorilla, orangutan, and rhesus macaque). We identified 89,744 DHS sites, of which 41% are identified as differential between species using the joint model compared with 33% using the conventional pairwise approach. The joint model provides a principled approach to distinguishing single from multiple chromatin accessibility changes among species. We found that nondifferential DHS sites are enriched for nucleotide conservation. Differential DHS sites with decreased chromatin accessibility relative to rhesus macaque occur more commonly near transcription start sites (TSS), while those with increased chromatin accessibility occur more commonly distal to TSS. Further, differential DHS sites near TSS are less cell type-specific than more distal regulatory elements. Taken together, these results point to distinct classes of DHS sites, each with distinct characteristics of selection, genomic location, and cell type specificity.


Assuntos
Cromatina/química , Evolução Molecular , Animais , Linhagem Celular , Desoxirribonuclease I , Genômica , Gorilla gorilla/genética , Humanos , Macaca mulatta/genética , Modelos Genéticos , Pan troglodytes/genética , Pongo/genética , Sítio de Iniciação de Transcrição
20.
Nat Commun ; 9(1): 5317, 2018 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-30575722

RESUMO

Environmental stimuli commonly act via changes in gene regulation. Human-genome-scale assays to measure such responses are indirect or require knowledge of the transcription factors (TFs) involved. Here, we present the use of human genome-wide high-throughput reporter assays to measure environmentally-responsive regulatory element activity. We focus on responses to glucocorticoids (GCs), an important class of pharmaceuticals and a paradigmatic genomic response model. We assay GC-responsive regulatory activity across >108 unique DNA fragments, covering the human genome at >50×. Those assays directly detected thousands of GC-responsive regulatory elements genome-wide. We then validate those findings with measurements of transcription factor occupancy, histone modifications, chromatin accessibility, and gene expression. We also detect allele-specific environmental responses. Notably, the assays did not require knowledge of GC response mechanisms. Thus, this technology can be used to agnostically quantify genomic responses for which the underlying mechanism remains unknown.


Assuntos
Regulação da Expressão Gênica/efeitos dos fármacos , Genoma Humano , Glucocorticoides/farmacologia , Elementos Reguladores de Transcrição/efeitos dos fármacos , Interação Gene-Ambiente , Ensaios de Triagem em Larga Escala , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA