Búsqueda | Biblioteca Virtual en Salud Fronteriza

1.

Direct GR Binding Sites Potentiate Clusters of TF Binding across the Human Genome.

Vockley, Christopher M; D'Ippolito, Anthony M; McDowell, Ian C; Majoros, William H; Safi, Alexias; Song, Lingyun; Crawford, Gregory E; Reddy, Timothy E.

Cell ; 166(5): 1269-1281.e19, 2016 Aug 25.

Artículo en Inglés | MEDLINE | ID: mdl-27565349

RESUMEN

The glucocorticoid receptor (GR) binds the human genome at >10,000 sites but only regulates the expression of hundreds of genes. To determine the functional effect of each site, we measured the glucocorticoid (GC) responsive activity of nearly all GR binding sites (GBSs) captured using chromatin immunoprecipitation (ChIP) in A549 cells. 13% of GBSs assayed had GC-induced activity. The responsive sites were defined by direct GR binding via a GC response element (GRE) and exclusively increased reporter-gene expression. Meanwhile, most GBSs lacked GC-induced reporter activity. The non-responsive sites had epigenetic features of steady-state enhancers and clustered around direct GBSs. Together, our data support a model in which clusters of GBSs observed with ChIP-seq reflect interactions between direct and tethered GBSs over tens of kilobases. We further show that those interactions can synergistically modulate the activity of direct GBSs and may therefore play a major role in driving gene activation in response to GCs.

Asunto(s)

Genoma Humano , Glucocorticoides/metabolismo , Receptores de Glucocorticoides/metabolismo , Factores de Transcripción/metabolismo , Activación Transcripcional , Células A549 , Sitios de Unión/efectos de los fármacos , Inmunoprecipitación de Cromatina , Dexametasona/metabolismo , Dexametasona/farmacología , Genes Reporteros , Glucocorticoides/farmacología , Humanos , Unión Proteica/efectos de los fármacos , Elementos de Respuesta

2.

Targeted long-read sequencing identifies missing disease-causing variation.

Miller, Danny E; Sulovari, Arvis; Wang, Tianyun; Loucks, Hailey; Hoekzema, Kendra; Munson, Katherine M; Lewis, Alexandra P; Fuerte, Edith P Almanza; Paschal, Catherine R; Walsh, Tom; Thies, Jenny; Bennett, James T; Glass, Ian; Dipple, Katrina M; Patterson, Karynne; Bonkowski, Emily S; Nelson, Zoe; Squire, Audrey; Sikes, Megan; Beckman, Erika; Bennett, Robin L; Earl, Dawn; Lee, Winston; Allikmets, Rando; Perlman, Seth J; Chow, Penny; Hing, Anne V; Wenger, Tara L; Adam, Margaret P; Sun, Angela; Lam, Christina; Chang, Irene; Zou, Xue; Austin, Stephanie L; Huggins, Erin; Safi, Alexias; Iyengar, Apoorva K; Reddy, Timothy E; Majoros, William H; Allen, Andrew S; Crawford, Gregory E; Kishnani, Priya S; King, Mary-Claire; Cherry, Tim; Chong, Jessica X; Bamshad, Michael J; Nickerson, Deborah A; Mefford, Heather C; Doherty, Dan; Eichler, Evan E.

Am J Hum Genet ; 108(8): 1436-1449, 2021 08 05.

Artículo en Inglés | MEDLINE | ID: mdl-34216551

RESUMEN

Despite widespread clinical genetic testing, many individuals with suspected genetic conditions lack a precise diagnosis, limiting their opportunity to take advantage of state-of-the-art treatments. In some cases, testing reveals difficult-to-evaluate structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted. We performed targeted long-read sequencing (T-LRS) using adaptive sampling on the Oxford Nanopore platform on 40 individuals, 10 of whom lacked a complete molecular diagnosis. We computationally targeted up to 151 Mbp of sequence per individual and searched for pathogenic substitutions, structural variants, and methylation differences using a single data source. We detected all genomic aberrations-including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences-identified by prior clinical testing. In 8/8 individuals with complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, leading to changes in clinical management in one case. In ten individuals with suspected Mendelian conditions lacking a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in six and variants of uncertain significance in two others. T-LRS accurately identifies pathogenic structural variants, resolves complex rearrangements, and identifies Mendelian variants not detected by other technologies. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority genes and regions or complex clinical testing results.

Asunto(s)

Aberraciones Cromosómicas , Análisis Citogenético/métodos , Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/genética , Predisposición Genética a la Enfermedad , Genoma Humano , Mutación , Variaciones en el Número de Copia de ADN , Femenino , Pruebas Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Cariotipificación , Masculino , Análisis de Secuencia de ADN

3.

Correcting signal biases and detecting regulatory elements in STARR-seq data.

Kim, Young-Sook; Johnson, Graham D; Seo, Jungkyun; Barrera, Alejandro; Cowart, Thomas N; Majoros, William H; Ochoa, Alejandro; Allen, Andrew S; Reddy, Timothy E.

Genome Res ; 31(5): 877-889, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-33722938

RESUMEN

High-throughput reporter assays such as self-transcribing active regulatory region sequencing (STARR-seq) have made it possible to measure regulatory element activity across the entire human genome at once. The resulting data, however, present substantial analytical challenges. Here, we identify technical biases that explain most of the variance in STARR-seq data. We then develop a statistical model to correct those biases and to improve detection of regulatory elements. This approach substantially improves precision and recall over current methods, improves detection of both activating and repressive regulatory elements, and controls for false discoveries despite strong local correlations in signal.

Asunto(s)

Elementos de Facilitación Genéticos , Genoma Humano , Sesgo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos

4.

Promoter Deletion Leading to Allele Specific Expression in a Genetically Unsolved Case of Primary Ciliary Dyskinesia.

Beaman, M Makenzie; Yin, Weining; Smith, Amanda J; Sears, Patrick R; Leigh, Margaret W; Ferkol, Thomas W; Kearney, Brendan; Olivier, Kenneth N; Kimple, Adam J; Clarke, Shannon; Huggins, Erin; Nading, Erica; Jung, Seung-Hye; Iyengar, Apoorva K; Zou, Xue; Dang, Hong; Barrera, Alejandro; Majoros, William H; Rehder, Catherine W; Reddy, Timothy E; Ostrowski, Lawrence E; Allen, Andrew S; Knowles, Michael R; Zariwala, Maimoona A; Crawford, Gregory E.

Am J Med Genet A ; : e63880, 2024 Oct 04.

Artículo en Inglés | MEDLINE | ID: mdl-39364610

RESUMEN

Variation in the non-coding genome represents an understudied mechanism of disease and it remains challenging to predict if single nucleotide variants, small insertions and deletions, or structural variants in non-coding genomic regions will be detrimental. Our approach using complementary RNA-seq and targeted long-read DNA sequencing can prioritize identification of non-coding variants that lead to disease via alteration of gene splicing or expression. We have identified a patient with primary ciliary dyskinesia with a pathogenic coding variant on one allele of the SPAG1 gene, while the second allele appears normal by whole exome sequencing despite an autosomal recessive inheritance pattern. RNA sequencing revealed reduced SPAG1 transcript levels and exclusive allele specific expression of the known pathogenic allele, suggesting the presence of a non-coding variant on the second allele that impacts transcription. Targeted long-read DNA sequencing identified a heterozygous 3 kilobase deletion of the 5' untranslated region of SPAG1, overlapping the promoter and first non-coding exon. This non-coding deletion was missed by whole exome sequencing and gene-specific deletion/duplication analysis, highlighting the importance of investigating the non-coding genome in patients with "missing" disease-causing variation. This paradigm demonstrates the utility of both RNA and long-read DNA sequencing in identifying pathogenic non-coding variants in patients with unexplained genetic disease.

5.

Full-length dystrophin restoration via targeted exon integration by AAV-CRISPR in a humanized mouse model of Duchenne muscular dystrophy.

Pickar-Oliver, Adrian; Gough, Veronica; Bohning, Joel D; Liu, Siyan; Robinson-Hamm, Jacqueline N; Daniels, Heather; Majoros, William H; Devlin, Garth; Asokan, Aravind; Gersbach, Charles A.

Mol Ther ; 29(11): 3243-3257, 2021 11 03.

Artículo en Inglés | MEDLINE | ID: mdl-34509668

RESUMEN

Targeted gene-editing strategies have emerged as promising therapeutic approaches for the permanent treatment of inherited genetic diseases. However, precise gene correction and insertion approaches using homology-directed repair are still limited by low efficiencies. Consequently, many gene-editing strategies have focused on removal or disruption, rather than repair, of genomic DNA. In contrast, homology-independent targeted integration (HITI) has been reported to effectively insert DNA sequences at targeted genomic loci. This approach could be particularly useful for restoring full-length sequences of genes affected by a spectrum of mutations that are also too large to deliver by conventional adeno-associated virus (AAV) vectors. Here, we utilize an AAV-based, HITI-mediated approach for correction of full-length dystrophin expression in a humanized mouse model of Duchenne muscular dystrophy (DMD). We co-deliver CRISPR-Cas9 and a donor DNA sequence to insert the missing human exon 52 into its corresponding position within the DMD gene and achieve full-length dystrophin correction in skeletal and cardiac muscle. Additionally, as a proof-of-concept strategy to correct genetic mutations characterized by diverse patient mutations, we deliver a superexon donor encoding the last 28 exons of the DMD gene as a therapeutic strategy to restore full-length dystrophin in >20% of the DMD patient population. This work highlights the potential of HITI-mediated gene correction for diverse DMD mutations and advances genome editing toward realizing the promise of full-length gene restoration to treat genetic disease.

Asunto(s)

Sistemas CRISPR-Cas , Dependovirus/genética , Distrofina/genética , Exones , Edición Génica , Vectores Genéticos/genética , Distrofia Muscular de Duchenne/genética , Distrofia Muscular de Duchenne/terapia , Animales , Modelos Animales de Enfermedad , Expresión Génica , Orden Génico , Técnicas de Transferencia de Gen , Ingeniería Genética , Terapia Genética/métodos , Humanos , Ratones , Ratones Transgénicos , Músculo Esquelético/metabolismo , Mutación , Miocardio/metabolismo , Integración Viral

6.

Glucocorticoid receptor recruits to enhancers and drives activation by motif-directed binding.

McDowell, Ian C; Barrera, Alejandro; D'Ippolito, Anthony M; Vockley, Christopher M; Hong, Linda K; Leichter, Sarah M; Bartelt, Luke C; Majoros, William H; Song, Lingyun; Safi, Alexias; Koçak, D Dewran; Gersbach, Charles A; Hartemink, Alexander J; Crawford, Gregory E; Engelhardt, Barbara E; Reddy, Timothy E.

Genome Res ; 28(9): 1272-1284, 2018 09.

Artículo en Inglés | MEDLINE | ID: mdl-30097539

RESUMEN

Glucocorticoids are potent steroid hormones that regulate immunity and metabolism by activating the transcription factor (TF) activity of glucocorticoid receptor (GR). Previous models have proposed that DNA binding motifs and sites of chromatin accessibility predetermine GR binding and activity. However, there are vast excesses of both features relative to the number of GR binding sites. Thus, these features alone are unlikely to account for the specificity of GR binding and activity. To identify genomic and epigenetic contributions to GR binding specificity and the downstream changes resultant from GR binding, we performed hundreds of genome-wide measurements of TF binding, epigenetic state, and gene expression across a 12-h time course of glucocorticoid exposure. We found that glucocorticoid treatment induces GR to bind to nearly all pre-established enhancers within minutes. However, GR binds to only a small fraction of the set of accessible sites that lack enhancer marks. Once GR is bound to enhancers, a combination of enhancer motif composition and interactions between enhancers then determines the strength and persistence of GR binding, which consequently correlates with dramatic shifts in enhancer activation. Over the course of several hours, highly coordinated changes in TF binding and histone modification occupancy occur specifically within enhancers, and these changes correlate with changes in the expression of nearby genes. Following GR binding, changes in the binding of other TFs precede changes in chromatin accessibility, suggesting that other TFs are also sensitive to genomic features beyond that of accessibility.

Asunto(s)

Elementos de Facilitación Genéticos , Código de Histonas , Motivos de Nucleótidos , Receptores de Glucocorticoides/metabolismo , Activación Transcripcional , Línea Celular Tumoral , Epigénesis Genética , Humanos , Unión Proteica , Factores de Transcripción/metabolismo

7.

Bayesian estimation of genetic regulatory effects in high-throughput reporter assays.

Majoros, William H; Kim, Young-Sook; Barrera, Alejandro; Li, Fan; Wang, Xingyan; Cunningham, Sarah J; Johnson, Graham D; Guo, Cong; Lowe, William L; Scholtens, Denise M; Hayes, M Geoffrey; Reddy, Timothy E; Allen, Andrew S.

Bioinformatics ; 36(2): 331-338, 2020 01 15.

Artículo en Inglés | MEDLINE | ID: mdl-31368479

RESUMEN

MOTIVATION: High-throughput reporter assays dramatically improve our ability to assign function to noncoding genetic variants, by measuring allelic effects on gene expression in the controlled setting of a reporter gene. Unlike genetic association tests, such assays are not confounded by linkage disequilibrium when loci are independently assayed. These methods can thus improve the identification of causal disease mutations. While work continues on improving experimental aspects of these assays, less effort has gone into developing methods for assessing the statistical significance of assay results, particularly in the case of rare variants captured from patient DNA. RESULTS: We describe a Bayesian hierarchical model, called Bayesian Inference of Regulatory Differences, which integrates prior information and explicitly accounts for variability between experimental replicates. The model produces substantially more accurate predictions than existing methods when allele frequencies are low, which is of clear advantage in the search for disease-causing variants in DNA captured from patient cohorts. Using the model, we demonstrate a clear tradeoff between variant sequencing coverage and numbers of biological replicates, and we show that the use of additional biological replicates decreases variance in estimates of effect size, due to the properties of the Poisson-binomial distribution. We also provide a power and sample size calculator, which facilitates decision making in experimental design parameters. AVAILABILITY AND IMPLEMENTATION: The software is freely available from www.geneprediction.org/bird. The experimental design web tool can be accessed at http://67.159.92.22:8080. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Programas Informáticos , Alelos , Teorema de Bayes , Frecuencia de los Genes , Humanos , Desequilibrio de Ligamiento

8.

Predicting gene structure changes resulting from genetic variants via exon definition features.

Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E.

Bioinformatics ; 34(21): 3616-3623, 2018 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-29701825

RESUMEN

Motivation: Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed and produce functional proteins. Results: We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and non-coding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or non-coding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products and we propose that they may commonly act as cryptic factors in disease. Availability and implementation: The software is available from geneprediction.org/SGRF. Supplementary information: Supplementary information is available at Bioinformatics online.

Asunto(s)

Exones , Empalme del ARN , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Análisis de Secuencia de ARN

9.

Massively parallel quantification of the regulatory effects of noncoding genetic variation in a human cohort.

Vockley, Christopher M; Guo, Cong; Majoros, William H; Nodzenski, Michael; Scholtens, Denise M; Hayes, M Geoffrey; Lowe, William L; Reddy, Timothy E.

Genome Res ; 25(8): 1206-14, 2015 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-26084464

RESUMEN

We report a novel high-throughput method to empirically quantify individual-specific regulatory element activity at the population scale. The approach combines targeted DNA capture with a high-throughput reporter gene expression assay. As demonstration, we measured the activity of more than 100 putative regulatory elements from 95 individuals in a single experiment. In agreement with previous reports, we found that most genetic variants have weak effects on distal regulatory element activity. Because haplotypes are typically maintained within but not between assayed regulatory elements, the approach can be used to identify causal regulatory haplotypes that likely contribute to human phenotypes. Finally, we demonstrate the utility of the method to functionally fine map causal regulatory variants in regions of high linkage disequilibrium identified by expression quantitative trait loci (eQTL) analyses.

Asunto(s)

Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuencias Reguladoras de Ácidos Nucleicos , Biología Computacional/métodos , Genoma Humano , Haplotipos , Humanos , Modelación Específica para el Paciente , Sitios de Carácter Cuantitativo

10.

High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE.

Majoros, William H; Campbell, Michael S; Holt, Carson; DeNardo, Erin K; Ware, Doreen; Allen, Andrew S; Yandell, Mark; Reddy, Timothy E.

Bioinformatics ; 33(10): 1437-1446, 2017 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-28011790

RESUMEN

MOTIVATION: The accurate interpretation of genetic variants is critical for characterizing genotype-phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. RESULTS: We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE ('Assessing Changes to Exons') converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. AVAILABILITY AND IMPLEMENTATION: ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE. CONTACT: myandell@genetics.utah.edu or tim.reddy@duke.edu. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.

Asunto(s)

Genómica/métodos , Polimorfismo Genético , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Animales , Eucariontes/genética , Exones , Haplotipos , Humanos , Mutación , Empalme del ARN

11.

MicroRNA target site identification by integrating sequence and binding information.

Majoros, William H; Lekprasert, Parawee; Mukherjee, Neelanjan; Skalsky, Rebecca L; Corcoran, David L; Cullen, Bryan R; Ohler, Uwe.

Nat Methods ; 10(7): 630-3, 2013 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-23708386

RESUMEN

High-throughput sequencing has opened numerous possibilities for the identification of regulatory RNA-binding events. Cross-linking and immunoprecipitation of Argonaute proteins can pinpoint a microRNA (miRNA) target site within tens of bases but leaves the identity of the miRNA unresolved. A flexible computational framework, microMUMMIE, integrates sequence with cross-linking features and reliably identifies the miRNA family involved in each binding event. It considerably outperforms sequence-only approaches and quantifies the prevalence of noncanonical binding modes.

Asunto(s)

Algoritmos , Mapeo de Interacción de Proteínas/métodos , Proteínas de Unión al ARN/genética , ARN/genética , ARN/metabolismo , Análisis de Secuencia de ARN/métodos , Integración de Sistemas

12.

Correction of dystrophin expression in cells from Duchenne muscular dystrophy patients through genomic excision of exon 51 by zinc finger nucleases.

Ousterout, David G; Kabadi, Ami M; Thakore, Pratiksha I; Perez-Pinera, Pablo; Brown, Matthew T; Majoros, William H; Reddy, Timothy E; Gersbach, Charles A.

Mol Ther ; 23(3): 523-32, 2015 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-25492562

RESUMEN

Duchenne muscular dystrophy (DMD) is caused by genetic mutations that result in the absence of dystrophin protein expression. Oligonucleotide-induced exon skipping can restore the dystrophin reading frame and protein production. However, this requires continuous drug administration and may not generate complete skipping of the targeted exon. In this study, we apply genome editing with zinc finger nucleases (ZFNs) to permanently remove essential splicing sequences in exon 51 of the dystrophin gene and thereby exclude exon 51 from the resulting dystrophin transcript. This approach can restore the dystrophin reading frame in ~13% of DMD patient mutations. Transfection of two ZFNs targeted to sites flanking the exon 51 splice acceptor into DMD patient myoblasts led to deletion of this genomic sequence. A clonal population was isolated with this deletion and following differentiation we confirmed loss of exon 51 from the dystrophin mRNA transcript and restoration of dystrophin protein expression. Furthermore, transplantation of corrected cells into immunodeficient mice resulted in human dystrophin expression localized to the sarcolemmal membrane. Finally, we quantified ZFN toxicity in human cells and mutagenesis at predicted off-target sites. This study demonstrates a powerful method to restore the dystrophin reading frame and protein expression by permanently deleting exons.

Asunto(s)

Distrofina/genética , Exones , Terapia Genética/métodos , Edición de ARN , ARN Mensajero/genética , Dedos de Zinc/genética , Animales , Secuencia de Bases , Distrofina/biosíntesis , Distrofina/química , Electroporación , Endonucleasas/genética , Endonucleasas/metabolismo , Humanos , Ratones , Ratones Endogámicos NOD , Ratones SCID , Datos de Secuencia Molecular , Distrofia Muscular de Duchenne/genética , Distrofia Muscular de Duchenne/metabolismo , Distrofia Muscular de Duchenne/patología , Distrofia Muscular de Duchenne/terapia , Mioblastos/metabolismo , Mioblastos/patología , Sistemas de Lectura Abierta , Plásmidos/química , Plásmidos/genética , Empalme del ARN , ARN Mensajero/química , ARN Mensajero/metabolismo , Eliminación de Secuencia

13.

Improved transcript isoform discovery using ORF graphs.

Majoros, William H; Lebeck, Niel; Ohler, Uwe; Li, Song.

Bioinformatics ; 30(14): 1958-64, 2014 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-24659106

RESUMEN

MOTIVATION: High-throughput sequencing of RNA in vivo facilitates many applications, not the least of which is the cataloging of variant splice isoforms of protein-coding messenger RNAs. Although many solutions have been proposed for reconstructing putative isoforms from deep sequencing data, these generally take as their substrate the collective alignment structure of RNA-seq reads and ignore the biological signals present in the actual nucleotide sequence. The majority of these solutions are graph-theoretic, relying on a splice graph representing the splicing patterns and exon expression levels indicated by the spliced-alignment process. RESULTS: We show how to augment splice graphs with additional information reflecting the biology of transcription, splicing and translation, to produce what we call an ORF (open reading frame) graph. We then show how ORF graphs can be used to produce isoform predictions with higher accuracy than current state-of-the-art approaches. AVAILABILITY AND IMPLEMENTATION: RSVP is available as C++ source code under an open-source licence: http://ohlerlab.mdc-berlin.de/software/RSVP/.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Sistemas de Lectura Abierta , Isoformas de ARN/química , Análisis de Secuencia de ARN/métodos , Arabidopsis/genética , Exones , Humanos , Isoformas de ARN/metabolismo , Empalme del ARN , Programas Informáticos

14.

Automated annotation of gene expression image sequences via non-parametric factor analysis and conditional random fields.

Pruteanu-Malinici, Iulian; Majoros, William H; Ohler, Uwe.

Bioinformatics ; 29(13): i27-35, 2013 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-23812993

RESUMEN

MOTIVATION: Computational approaches for the annotation of phenotypes from image data have shown promising results across many applications, and provide rich and valuable information for studying gene function and interactions. While data are often available both at high spatial resolution and across multiple time points, phenotypes are frequently annotated independently, for individual time points only. In particular, for the analysis of developmental gene expression patterns, it is biologically sensible when images across multiple time points are jointly accounted for, such that spatial and temporal dependencies are captured simultaneously. METHODS: We describe a discriminative undirected graphical model to label gene-expression time-series image data, with an efficient training and decoding method based on the junction tree algorithm. The approach is based on an effective feature selection technique, consisting of a non-parametric sparse Bayesian factor analysis model. The result is a flexible framework, which can handle large-scale data with noisy incomplete samples, i.e. it can tolerate data missing from individual time points. RESULTS: Using the annotation of gene expression patterns across stages of Drosophila embryonic development as an example, we demonstrate that our method achieves superior accuracy, gained by jointly annotating phenotype sequences, when compared with previous models that annotate each stage in isolation. The experimental results on missing data indicate that our joint learning method successfully annotates genes for which no expression data are available for one or more stages.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Modelos Estadísticos , Algoritmos , Animales , Teorema de Bayes , Drosophila/embriología , Drosophila/genética , Desarrollo Embrionario/genética , Análisis Factorial , Hibridación in Situ , ARN Mensajero/análisis , ARN Mensajero/química , Estadísticas no Paramétricas , Vocabulario Controlado

15.

Bayesian Estimation of Allele-Specific Expression in the Presence of Phasing Uncertainty.

Zou, Xue; Gomez, Zachary W; Reddy, Timothy E; Allen, Andrew S; Majoros, William H.

bioRxiv ; 2024 Aug 13.

Artículo en Inglés | MEDLINE | ID: mdl-39211106

RESUMEN

Motivation: Allele-specific expression (ASE) analyses aim to detect imbalanced expression of maternal versus paternal copies of an autosomal gene. Such allelic imbalance can result from a variety of cis-acting causes, including disruptive mutations within one copy of a gene that impact the stability of transcripts, as well as regulatory variants outside the gene that impact transcription initiation. Current methods for ASE estimation suffer from a number of shortcomings, such as relying on only one variant within a gene, assuming perfect phasing information across multiple variants within a gene, or failing to account for alignment biases and possible genotyping errors. Results: We developed BEASTIE, a Bayesian hierarchical model designed for precise ASE quantification at the gene level, based on given genotypes and RNA-Seq data. BEASTIE addresses the complexities of allelic mapping bias, genotyping error, and phasing errors by incorporating empirical phasing error rates derived from Genome-in-a-Bottle individual NA12878. BEASTIE surpasses existing methods in accuracy, especially in scenarios with high phasing errors. This improvement is critical for identifying rare genetic variants often obscured by such errors. Through rigorous validation on simulated data and application to real data from the 1000 Genomes Project, we establish the robustness of BEASTIE. These findings underscore the value of BEASTIE in revealing patterns of ASE across gene sets and pathways. Availability and Implementation: The software is freely available from https://github.com/x811zou/BEASTIE . BEASTIE is available as Python source code and as a Docker image. Supplementary information: Additional information is available online.

16.

Characterization and bioinformatic filtering of ambient gRNAs in single-cell CRISPR screens using CLEANSER.

Liu, Siyan; Hamilton, Marisa C; Cowart, Thomas; Barrera, Alejandro; Bounds, Lexi R; Nelson, Alexander C; Doty, Richard W; Allen, Andrew S; Crawford, Gregory E; Majoros, William H; Gersbach, Charles A.

bioRxiv ; 2024 Sep 04.

Artículo en Inglés | MEDLINE | ID: mdl-39282389

RESUMEN

Recent technological developments in single-cell RNA-seq CRISPR screens enable high-throughput investigation of the genome. Through transduction of a gRNA library to a cell population followed by transcriptomic profiling by scRNA-seq, it is possible to characterize the effects of thousands of genomic perturbations on global gene expression. A major source of noise in scRNA-seq CRISPR screens are ambient gRNAs, which are contaminating gRNAs that likely originate from other cells. If not properly filtered, ambient gRNAs can result in an excess of false positive gRNA assignments. Here, we utilize CRISPR barnyard assays to characterize ambient gRNA noise in single-cell CRISPR screens. We use these datasets to develop and train CLEANSER, a mixture model that identifies and filters ambient gRNA noise. This model takes advantage of the bimodal distribution between native and ambient gRNAs and includes both gRNA and cell-specific normalization parameters, correcting for confounding technical factors that affect individual gRNAs and cells. The output of CLEANSER is the probability that a gRNA-cell assignment is in the native distribution over the ambient distribution. We find that ambient gRNA filtering methods impact differential gene expression analysis outcomes and that CLEANSER outperforms alternate approaches by increasing gRNA-cell assignment accuracy.

17.

A viral microRNA functions as an orthologue of cellular miR-155.

Gottwein, Eva; Mukherjee, Neelanjan; Sachse, Christoph; Frenzel, Corina; Majoros, William H; Chi, Jen-Tsan A; Braich, Ravi; Manoharan, Muthiah; Soutschek, Jürgen; Ohler, Uwe; Cullen, Bryan R.

Nature ; 450(7172): 1096-9, 2007 Dec 13.

Artículo en Inglés | MEDLINE | ID: mdl-18075594

RESUMEN

All metazoan eukaryotes express microRNAs (miRNAs), roughly 22-nucleotide regulatory RNAs that can repress the expression of messenger RNAs bearing complementary sequences. Several DNA viruses also express miRNAs in infected cells, suggesting a role in viral replication and pathogenesis. Although specific viral miRNAs have been shown to autoregulate viral mRNAs or downregulate cellular mRNAs, the function of most viral miRNAs remains unknown. Here we report that the miR-K12-11 miRNA encoded by Kaposi's-sarcoma-associated herpes virus (KSHV) shows significant homology to cellular miR-155, including the entire miRNA 'seed' region. Using a range of assays, we show that expression of physiological levels of miR-K12-11 or miR-155 results in the downregulation of an extensive set of common mRNA targets, including genes with known roles in cell growth regulation. Our findings indicate that viral miR-K12-11 functions as an orthologue of cellular miR-155 and probably evolved to exploit a pre-existing gene regulatory pathway in B cells. Moreover, the known aetiological role of miR-155 in B-cell transformation suggests that miR-K12-11 may contribute to the induction of KSHV-positive B-cell tumours in infected patients.

Asunto(s)

Regulación de la Expresión Génica , Herpesvirus Humano 8/genética , MicroARNs/genética , ARN Viral/genética , Homología de Secuencia de Ácido Nucleico , Regiones no Traducidas 3'/genética , Regiones no Traducidas 3'/metabolismo , Linfocitos B/metabolismo , Linfocitos B/patología , Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/genética , Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/metabolismo , Línea Celular , Transformación Celular Viral/genética , Proteínas del Grupo de Complementación de la Anemia de Fanconi/genética , Proteínas del Grupo de Complementación de la Anemia de Fanconi/metabolismo , Perfilación de la Expresión Génica , Humanos , MicroARNs/metabolismo , Proteínas Proto-Oncogénicas c-fos/genética , Proteínas Proto-Oncogénicas c-fos/metabolismo , ARN Viral/metabolismo , Especificidad por Sustrato

18.

Modeling the evolution of regulatory elements by simultaneous detection and alignment with phylogenetic pair HMMs.

Majoros, William H; Ohler, Uwe.

PLoS Comput Biol ; 6(12): e1001037, 2010 Dec 16.

Artículo en Inglés | MEDLINE | ID: mdl-21187896

RESUMEN

The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.

Asunto(s)

Biología Computacional/métodos , Evolución Molecular , Cadenas de Markov , Elementos Reguladores de la Transcripción/genética , Alineación de Secuencia/métodos , Animales , Secuencia de Bases , Simulación por Computador , Drosophila melanogaster/genética , Regulación de la Expresión Génica , Datos de Secuencia Molecular , Filogenia , Curva ROC , Análisis de Secuencia de ADN

19.

Complexity reduction in context-dependent DNA substitution models.

Majoros, William H; Ohler, Uwe.

Bioinformatics ; 25(2): 175-82, 2009 Jan 15.

Artículo en Inglés | MEDLINE | ID: mdl-19017657

RESUMEN

MOTIVATION: The modeling of conservation patterns in genomic DNA has become increasingly popular for a number of bioinformatic applications. While several systems developed to date incorporate context-dependence in their substitution models, the impact on computational complexity and generalization ability of the resulting higher order models invites the question of whether simpler approaches to context modeling might permit appreciable reductions in model complexity and computational cost, without sacrificing prediction accuracy. RESULTS: We formulate several alternative methods for context modeling based on windowed Bayesian networks, and compare their effects on both accuracy and computational complexity for the task of discriminating functionally distinct segments in vertebrate DNA. Our results show that substantial reductions in the complexity of both the model and the associated inference algorithm can be achieved without reducing predictive accuracy.

Asunto(s)

Análisis de Secuencia de ADN/métodos , Algoritmos , Teorema de Bayes , Simulación por Computador , ADN/química , Genoma , Modelos Genéticos , Programas Informáticos

20.

Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote.

Eisen, Jonathan A; Coyne, Robert S; Wu, Martin; Wu, Dongying; Thiagarajan, Mathangi; Wortman, Jennifer R; Badger, Jonathan H; Ren, Qinghu; Amedeo, Paolo; Jones, Kristie M; Tallon, Luke J; Delcher, Arthur L; Salzberg, Steven L; Silva, Joana C; Haas, Brian J; Majoros, William H; Farzad, Maryam; Carlton, Jane M; Smith, Roger K; Garg, Jyoti; Pearlman, Ronald E; Karrer, Kathleen M; Sun, Lei; Manning, Gerard; Elde, Nels C; Turkewitz, Aaron P; Asai, David J; Wilkes, David E; Wang, Yufeng; Cai, Hong; Collins, Kathleen; Stewart, B Andrew; Lee, Suzanne R; Wilamowska, Katarzyna; Weinberg, Zasha; Ruzzo, Walter L; Wloga, Dorota; Gaertig, Jacek; Frankel, Joseph; Tsao, Che-Chia; Gorovsky, Martin A; Keeling, Patrick J; Waller, Ross F; Patron, Nicola J; Cherry, J Michael; Stover, Nicholas A; Krieger, Cynthia J; del Toro, Christina; Ryder, Hilary F; Williamson, Sondra C.

PLoS Biol ; 4(9): e286, 2006 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-16933976

RESUMEN

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC) has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC), which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases), using diverse resources (e.g., proteases and transporters), and generating structural complexity (e.g., kinesins and dyneins). In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates), no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from other model organisms makes T. thermophila an ideal model for functional genomic studies to address biological, biomedical, and biotechnological questions of fundamental importance.

Asunto(s)

Genoma de Protozoos , Macronúcleo/genética , Modelos Biológicos , Tetrahymena thermophila/genética , Animales , Células Cultivadas , Mapeo Cromosómico/métodos , Cromosomas , Bases de Datos Genéticas , Células Eucariotas/fisiología , Evolución Molecular , Micronúcleo Germinal/genética , Modelos Animales , Filogenia , Transducción de Señal

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA