Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 111
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 2024 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-39077930

RESUMEN

Off-target effects present a significant impediment to the safe and efficient use of CRISPR-Cas genome editing. Since off-target activity is influenced by the genomic sequence, the presence of sequence variants leads to varying on- and off-target profiles among different alleles or individuals. However, a reliable tool that quantifies genome editing activity in an allelic context is not available. Here, we introduce CRISPECTOR2.0, an extended version of our previously published software tool CRISPECTOR, with an allele-specific editing activity quantification option. CRISPECTOR2.0 enables reference-free, allele-aware, precise quantification of on- and off-target activity, by using de novo sample-specific single nucleotide variant (SNV) detection and statistical-based allele-calling algorithms. We demonstrate CRISPECTOR2.0 efficacy in analyzing samples containing multiple alleles and quantifying allele-specific editing activity, using data from diverse cell types, including primary human cells, plants, and an original extensive human cell line database. We identified instances where an SNV induced changes in the protospacer adjacent motif sequence, resulting in allele-specific editing. Intriguingly, differential allelic editing was also observed in regions carrying distal SNVs, hinting at the involvement of additional epigenetic factors. Our findings highlight the importance of allele-specific editing measurement as a milestone in the adaptation of efficient, accurate, and safe personalized genome editing.

2.
Mol Cell ; 65(4): 604-617.e6, 2017 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-28212748

RESUMEN

Precise gene expression patterns are established by transcription factor (TFs) binding to regulatory sequences. While these events occur in the context of chromatin, our understanding of how TF-nucleosome interplay affects gene expression is highly limited. Here, we present an assay for high-resolution measurements of both DNA occupancy and gene expression on large-scale libraries of systematically designed regulatory sequences. Our assay reveals occupancy patterns at the single-cell level. It provides an accurate quantification of the fraction of the population bound by a nucleosome and captures distinct, even adjacent, TF binding events. By applying this assay to over 1,500 promoter variants in yeast, we reveal pronounced differences in the dependency of TF activity on chromatin and classify TFs by their differential capacity to alter chromatin and promote expression. We further demonstrate how different regulatory sequences give rise to nucleosome-mediated TF collaborations that quantitatively account for the resulting expression.


Asunto(s)
Cromatina/metabolismo , ADN de Hongos/metabolismo , Nucleosomas/metabolismo , Regiones Promotoras Genéticas , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo , Sitios de Unión , Cromatina/genética , Biología Computacional , ADN de Hongos/genética , Bases de Datos Genéticas , Regulación Fúngica de la Expresión Génica , Biblioteca de Genes , Ensayos Analíticos de Alto Rendimiento , Nucleosomas/genética , Unión Proteica , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Factores de Transcripción/genética
3.
Skin Res Technol ; 30(5): e13706, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38721854

RESUMEN

BACKGROUND: The incidence rates of cutaneous squamous cell carcinoma (cSCC) and basal cell carcinoma (BCC) skin cancers are rising, while the current diagnostic process is time-consuming. We describe the development of a novel approach to high-throughput sampling of tissue lipids using electroporation-based biopsy, termed e-biopsy. We report on the ability of the e-biopsy technique to harvest large amounts of lipids from human skin samples. MATERIALS AND METHODS: Here, 168 lipids were reliably identified from 12 patients providing a total of 13 samples. The extracted lipids were profiled with ultra-performance liquid chromatography and tandem mass spectrometry (UPLC-MS-MS) providing cSCC, BCC, and healthy skin lipidomic profiles. RESULTS: Comparative analysis identified 27 differentially expressed lipids (p < 0.05). The general profile trend is low diglycerides in both cSCC and BCC, high phospholipids in BCC, and high lyso-phospholipids in cSCC compared to healthy skin tissue samples. CONCLUSION: The results contribute to the growing body of knowledge that can potentially lead to novel insights into these skin cancers and demonstrate the potential of the e-biopsy technique for the analysis of lipidomic profiles of human skin tissues.


Asunto(s)
Carcinoma Basocelular , Carcinoma de Células Escamosas , Electroporación , Lipidómica , Neoplasias Cutáneas , Piel , Humanos , Carcinoma Basocelular/patología , Carcinoma Basocelular/metabolismo , Carcinoma Basocelular/diagnóstico , Neoplasias Cutáneas/patología , Neoplasias Cutáneas/metabolismo , Carcinoma de Células Escamosas/patología , Carcinoma de Células Escamosas/metabolismo , Carcinoma de Células Escamosas/química , Lipidómica/métodos , Biopsia , Piel/patología , Piel/metabolismo , Piel/química , Femenino , Masculino , Electroporación/métodos , Persona de Mediana Edad , Anciano , Lípidos/análisis , Espectrometría de Masas en Tándem/métodos
4.
PLoS Genet ; 17(9): e1009805, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34570750

RESUMEN

RNA splicing is a key process in eukaryotic gene expression, in which an intron is spliced out of a pre-mRNA molecule to eventually produce a mature mRNA. Most intron-containing genes are constitutively spliced, hence efficient splicing of an intron is crucial for efficient regulation of gene expression. Here we use a large synthetic oligo library of ~20,000 variants to explore how different intronic sequence features affect splicing efficiency and mRNA expression levels in S. cerevisiae. Introns are defined by three functional sites, the 5' donor site, the branch site, and the 3' acceptor site. Using a combinatorial design of synthetic introns, we demonstrate how non-consensus splice site sequences in each of these sites affect splicing efficiency. We then show that S. cerevisiae splicing machinery tends to select alternative 3' splice sites downstream of the original site, and we suggest that this tendency created a selective pressure, leading to the avoidance of cryptic splice site motifs near introns' 3' ends. We further use natural intronic sequences from other yeast species, whose splicing machineries have diverged to various extents, to show how intron architectures in the various species have been adapted to the organism's splicing machinery. We suggest that the observed tendency for cryptic splicing is a result of a loss of a specific splicing factor, U2AF1. Lastly, we show that synthetic sequences containing two introns give rise to alternative RNA isoforms in S. cerevisiae, demonstrating that merely a synthetic fusion of two introns might be suffice to facilitate alternative splicing in yeast. Our study reveals novel mechanisms by which introns are shaped in evolution to allow cells to regulate their transcriptome. In addition, it provides a valuable resource to study the regulation of constitutive and alternative splicing in a model organism.


Asunto(s)
Empalme del ARN , Saccharomyces cerevisiae/genética , Biología Computacional/métodos , Evolución Molecular , Genes Fúngicos , Secuenciación de Nucleótidos de Alto Rendimiento , Intrones , ARN Mensajero/genética
5.
Bioinformatics ; 37(23): 4451-4459, 2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-34255820

RESUMEN

MOTIVATION: Log-rank test is a widely used test that serves to assess the statistical significance of observed differences in survival, when comparing two or more groups. The log-rank test is based on several assumptions that support the validity of the calculations. It is naturally assumed, implicitly, that no errors occur in the labeling of the samples. That is, the mapping between samples and groups is perfectly correct. In this work, we investigate how test results may be affected when considering some errors in the original labeling. RESULTS: We introduce and define the uncertainty that arises from labeling errors in log-rank test. In order to deal with this uncertainty, we develop a novel algorithm for efficiently calculating a stability interval around the original log-rank P-value and prove its correctness. We demonstrate our algorithm on several datasets. AVAILABILITY AND IMPLEMENTATION: We provide a Python implementation, called LoRSI, for calculating the stability interval using our algorithm https://github.com/YakhiniGroup/LoRSI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Incertidumbre
6.
Bioinformatics ; 37(21): 3796-3804, 2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34358288

RESUMEN

MOTIVATION: Tumour heterogeneity is being increasingly recognized as an important characteristic of cancer and as a determinant of prognosis and treatment outcome. Emerging spatial transcriptomics data hold the potential to further our understanding of tumour heterogeneity and its implications. However, existing statistical tools are not sufficiently powerful to capture heterogeneity in the complex setting of spatial molecular biology. RESULTS: We provide a statistical solution, the HeTerogeneity Average index (HTA), specifically designed to handle the multivariate nature of spatial transcriptomics. We prove that HTA has an approximately normal distribution, therefore lending itself to efficient statistical assessment and inference. We first demonstrate that HTA accurately reflects the level of heterogeneity in simulated data. We then use HTA to analyze heterogeneity in two cancer spatial transcriptomics datasets: spatial RNA sequencing by 10x Genomics and spatial transcriptomics inferred from H&E. Finally, we demonstrate that HTA also applies to 3D spatial data using brain MRI. In spatial RNA sequencing, we use a known combination of molecular traits to assert that HTA aligns with the expected outcome for this combination. We also show that HTA captures immune-cell infiltration at multiple resolutions. In digital pathology, we show how HTA can be used in survival analysis and demonstrate that high levels of heterogeneity may be linked to poor survival. In brain MRI, we show that HTA differentiates between normal ageing, Alzheimer's disease and two tumours. HTA also extends beyond molecular biology and medical imaging, and can be applied to many domains, including GIS. AVAILABILITY AND IMPLEMENTATION: Python package and source code are available at: https://github.com/alonalj/hta. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias , Transcriptoma , Humanos , Evaluación de la Tecnología Biomédica , Genómica , Neuroimagen
7.
Bioinformatics ; 37(5): 720-722, 2021 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-32840559

RESUMEN

MOTIVATION: Recent years have seen a growing number and an expanding scope of studies using synthetic oligo libraries for a range of applications in synthetic biology. As experiments are growing by numbers and complexity, analysis tools can facilitate quality control and support better assessment and inference. RESULTS: We present a novel analysis tool, called SOLQC, which enables fast and comprehensive analysis of synthetic oligo libraries, based on NGS analysis performed by the user. SOLQC provides statistical information such as the distribution of variant representation, different error rates and their dependence on sequence or library properties. SOLQC produces graphical reports from the analysis, in a flexible format. We demonstrate SOLQC by analyzing literature libraries. We also discuss the potential benefits and relevance of the different components of the analysis. AVAILABILITY AND IMPLEMENTATION: SOLQC is a free software for non-commercial use, available at https://app.gitbook.com/@yoav-orlev/s/solqc/. For commercial use please contact the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bibliotecas , Programas Informáticos , Biblioteca de Genes , Control de Calidad , Biología Sintética
8.
PLoS Comput Biol ; 17(2): e1008608, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33566819

RESUMEN

Different miRNA profiling protocols and technologies introduce differences in the resulting quantitative expression profiles. These include differences in the presence (and measurability) of certain miRNAs. We present and examine a method based on quantile normalization, Adjusted Quantile Normalization (AQuN), to combine miRNA expression data from multiple studies in breast cancer into a single joint dataset for integrative analysis. By pooling multiple datasets, we obtain increased statistical power, surfacing patterns that do not emerge as statistically significant when separately analyzing these datasets. To merge several datasets, as we do here, one needs to overcome both technical and batch differences between these datasets. We compare several approaches for merging and jointly analyzing miRNA datasets. We investigate the statistical confidence for known results and highlight potential new findings that resulted from the joint analysis using AQuN. In particular, we detect several miRNAs to be differentially expressed in estrogen receptor (ER) positive versus ER negative samples. In addition, we identify new potential biomarkers and therapeutic targets for both clinical groups. As a specific example, using the AQuN-derived dataset we detect hsa-miR-193b-5p to have a statistically significant over-expression in the ER positive group, a phenomenon that was not previously reported. Furthermore, as demonstrated by functional assays in breast cancer cell lines, overexpression of hsa-miR-193b-5p in breast cancer cell lines resulted in decreased cell viability in addition to inducing apoptosis. Together, these observations suggest a novel functional role for this miRNA in breast cancer. Packages implementing AQuN are provided for Python and Matlab: https://github.com/YakhiniGroup/PyAQN.


Asunto(s)
Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , MicroARNs/metabolismo , Algoritmos , Biomarcadores/metabolismo , Biomarcadores de Tumor/genética , Línea Celular Tumoral , Simulación por Computador , Receptor alfa de Estrógeno/metabolismo , Femenino , Humanos , Células MCF-7 , Análisis de Secuencia por Matrices de Oligonucleótidos , Lenguajes de Programación , ARN Mensajero/genética
9.
Sensors (Basel) ; 22(16)2022 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-36016067

RESUMEN

Analysing human physiological data allows access to the health state and the state of mind of the subject individual. Whenever a person is sick, having a panic attack, happy or scared, physiological signals will be different. In terms of physiological signals, we focus, in this manuscript, on monitoring breathing patterns. The scope can be extended to also address heart rate and other variables. We describe an analysis of breathing rate patterns during activities including resting, walking, running and watching a movie. We model normal breathing behaviours by statistically analysing signals, processed to represent quantities of interest. We consider moving maximum/minimum, the amplitude and the Fourier transform of the respiration signal, working with different window sizes. We then learn a statistical model for the basal behaviour, per individual, and detect outliers. When outliers are detected, a system that incorporates our approach would send a visible signal through a smart garment or through other means. We describe alert generation performance in two datasets-one literature dataset and one collected as a field study for this work. In particular, when learning personal rest distributions for the breathing signals of 14 subjects, we see alerts generated more often when the same individual is running than when they are tested in rest conditions.


Asunto(s)
Respiración , Frecuencia Respiratoria , Humanos , Modelos Estadísticos , Descanso
10.
Stem Cells ; 37(2): 176-189, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30379370

RESUMEN

The interactions of cancer stem cells (CSCs) within the tumor microenvironment (TME), contribute to the overall phenomenon of intratumoral heterogeneity, which also involve CSC interactions with noncancer stromal cells. Comprehensive understanding of the tumorigenesis process requires elucidating the coordinated gene expression between cancer and tumor stromal cells for each tumor. We show that human gastric cancer cells (GSC1) subvert gene expression and cytokine production by mesenchymal stem cells (GSC-MSC), thus promoting tumor progression. Using mixed composition of human tumor xenografts, organotypic culture, and in vitro assays, we demonstrate GSC1-mediated specific reprogramming of "naïve" MSC into specialized tumor associated MSC equipped with a tumor-promoting phenotype. Although paracrine effect of GSC-MSC or primed-MSC is sufficient to enable 2D growth of GSC1, cell-cell interaction with GSC-MSC is necessary for 3D growth and in vivo tumor formation. At both the transcriptional and at the protein level, RNA-Seq and proteome analyses, respectively, revealed increased R-spondin expression in primed-MSC, and paracrine and juxtacrine mediated elevation of Lgr5 expression in GSC1, suggesting GSC-MSC-mediated support of cancer stemness in GSC1. CSC properties are sustained in vivo through the interplay between GSC1 and GSC-MSC, activating the R-spondin/Lgr5 axis and WNT/ß-catenin signaling pathway. ß-Catenin+ cell clusters show ß-catenin nuclear localization, indicating the activation of the WNT/ß-catenin signaling pathway in these cells. The ß-catenin+ cluster of cells overlap the Lgr5+ cells, however, not all Lgr5+ cells express ß-catenin. A predominant means to sustain the CSC contribution to tumor progression appears to be subversion of MSC in the TME by cancer cells. Stem Cells 2018 Stem Cells 2019;37:176-189.


Asunto(s)
Reprogramación Celular/genética , Células Madre Mesenquimatosas/metabolismo , Neoplasias Gástricas/genética , Humanos , Neoplasias Gástricas/metabolismo , Microambiente Tumoral
11.
Nucleic Acids Res ; 46(W1): W221-W228, 2018 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-29800452

RESUMEN

Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.


Asunto(s)
Proteínas de Unión al ARN/metabolismo , ARN/química , Programas Informáticos , Sitios de Unión , Internet , Conformación de Ácido Nucleico , Motivos de Nucleótidos , Posición Específica de Matrices de Puntuación , Análisis de Secuencia de ARN
12.
BMC Bioinformatics ; 19(1): 368, 2018 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-30305012

RESUMEN

BACKGROUND: Synthetic biology and related techniques enable genome scale high-throughput investigation of the effect on organism fitness of different gene knock-downs/outs and of other modifications of genomic sequence. RESULTS: We develop statistical and computational pipelines and frameworks for analyzing high throughput fitness data over a genome scale set of sequence variants. Analyzing data from a high-throughput knock-down/knock-out bacterial study, we investigate differences and determinants of the effect on fitness in different conditions. Comparing fitness vectors of genes, across tens of conditions, we observe that fitness consequences strongly depend on genomic location and more weakly depend on gene sequence similarity and on functional relationships. In analyzing promoter sequences, we identified motifs associated with conditions studied in bacterial media such as Casaminos, D-glucose, Sucrose, and other sugars and amino-acid sources. We also use fitness data to infer genes associated with orphan metabolic reactions in the iJO1366 E. coli metabolic model. To do this, we developed a new computational method that integrates gene fitness and gene expression profiles within a given reaction network neighborhood to associate this reaction with a set of genes that potentially encode the catalyzing proteins. We then apply this approach to predict candidate genes for 107 orphan reactions in iJO1366. Furthermore - we validate our methodology with known reactions using a leave-one-out approach. Specifically, using top-20 candidates selected based on combined fitness and expression datasets, we correctly reconstruct 39.7% of the reactions, as compared to 33% based on fitness and to 26% based on expression separately, and to 4.02% as a random baseline. Our model improvement results include a novel association of a gene to an orphan cytosine nucleosidation reaction. CONCLUSION: Our pipeline for metabolic modeling shows a clear benefit of using fitness data for predicting genes of orphan reactions. Along with the analysis pipelines we developed, it can be used to analyze similar high-throughput data.


Asunto(s)
Prueba de Esfuerzo/métodos , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Humanos , Modelos Biológicos
13.
Genome Res ; 25(7): 1018-29, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25762553

RESUMEN

Binding of transcription factors (TFs) to regulatory sequences is a pivotal step in the control of gene expression. Despite many advances in the characterization of sequence motifs recognized by TFs, our ability to quantitatively predict TF binding to different regulatory sequences is still limited. Here, we present a novel experimental assay termed BunDLE-seq that provides quantitative measurements of TF binding to thousands of fully designed sequences of 200 bp in length within a single experiment. Applying this binding assay to two yeast TFs, we demonstrate that sequences outside the core TF binding site profoundly affect TF binding. We show that TF-specific models based on the sequence or DNA shape of the regions flanking the core binding site are highly predictive of the measured differential TF binding. We further characterize the dependence of TF binding, accounting for measurements of single and co-occurring binding events, on the number and location of binding sites and on the TF concentration. Finally, by coupling our in vitro TF binding measurements, and another application of our method probing nucleosome formation, to in vivo expression measurements carried out with the same template sequences serving as promoters, we offer insights into mechanisms that may determine the different expression outcomes observed. Our assay thus paves the way to a more comprehensive understanding of TF binding to regulatory sequences and allows the characterization of TF binding determinants within and outside of core binding sites.


Asunto(s)
Sitios de Unión , Factores de Transcripción/metabolismo , Biología Computacional/métodos , Nucleosomas/metabolismo , Poli A , Poli T , Unión Proteica , Secuencias Reguladoras de Ácidos Nucleicos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Termodinámica
14.
Methods ; 118-119: 73-81, 2017 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-28274760

RESUMEN

RNA binding proteins (RBPs) play an important role in regulating many processes in the cell. RBPs often recognize their RNA targets in a specific manner. In addition to the RNA primary sequence, the structure of the RNA has been shown to play a central role in RNA recognition by RBPs. In recent years, many experimental approaches, both in vitro and in vivo, were developed and employed to identify and characterize RBP targets and extract their binding specificities. In vivo binding techniques, such as CrossLinking and ImmunoPrecipitation (CLIP)-based methods, enable the characterization of protein binding sites on RNA targets. However, these methods do not provide information regarding the structural preferences of the protein. While methods to obtain the structure of RNA are available, inferring both the sequence and the structure preferences of RBPs remains a challenge. Here we present SMARTIV, a novel computational tool for discovering combined sequence and structure binding motifs from in vivo RNA binding data relying on the sequences of the target sites, the ranking of their binding scores and their predicted secondary structure. The combined motifs are provided in a unified representation that is informative and easy for visual perception. We tested the method on CLIP-seq data from different platforms for a variety of RBPs. Overall, we show that our results are highly consistent with known binding motifs of RBPs, offering additional information on their structural preferences.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteínas de Unión al ARN/genética , ARN/química , Análisis de Secuencia de ARN/estadística & datos numéricos , Programas Informáticos , Secuencia de Bases , Sitios de Unión , Línea Celular , Conjuntos de Datos como Asunto , Humanos , Inmunoprecipitación , Conformación de Ácido Nucleico , Unión Proteica , ARN/genética , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Análisis de Secuencia de ARN/métodos , Transcriptoma
15.
PLoS Genet ; 11(4): e1005147, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25875337

RESUMEN

The 3'end genomic region encodes a wide range of regulatory process including mRNA stability, 3' end processing and translation. Here, we systematically investigate the sequence determinants of 3' end mediated expression control by measuring the effect of 13,000 designed 3' end sequence variants on constitutive expression levels in yeast. By including a high resolution scanning mutagenesis of more than 200 native 3' end sequences in this designed set, we found that most mutations had only a mild effect on expression, and that the vast majority (~90%) of strongly effecting mutations localized to a single positive TA-rich element, similar to a previously described 3' end processing efficiency element, and resulted in up to ten-fold decrease in expression. Measurements of 3' UTR lengths revealed that these mutations result in mRNAs with aberrantly long 3'UTRs, confirming the role for this element in 3' end processing. Interestingly, we found that other sequence elements that were previously described in the literature to be part of the polyadenylation signal had a minor effect on expression. We further characterize the sequence specificities of the TA-rich element using additional synthetic 3' end sequences and show that its activity is sensitive to single base pair mutations and strongly depends on the A/T content of the surrounding sequences. Finally, using a computational model, we show that the strength of this element in native 3' end sequences can explain some of their measured expression variability (R = 0.41). Together, our results emphasize the importance of efficient 3' end processing for endogenous protein levels and contribute to an improved understanding of the sequence elements involved in this process.


Asunto(s)
Regiones no Traducidas 3' , Regulación Fúngica de la Expresión Génica , Levaduras/genética , Genoma Fúngico , ARN Mensajero/genética , ARN Mensajero/metabolismo , Levaduras/metabolismo
16.
Genome Res ; 24(10): 1698-706, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-25030889

RESUMEN

Genetically identical cells exhibit large variability (noise) in gene expression, with important consequences for cellular function. Although the amount of noise decreases with and is thus partly determined by the mean expression level, the extent to which different promoter sequences can deviate away from this trend is not fully known. Here, we present a high-throughput method for measuring promoter-driven noise for thousands of designed synthetic promoters in parallel. We use it to investigate how promoters encode different noise levels and find that the noise levels of promoters with similar mean expression levels can vary more than one order of magnitude, with nucleosome-disfavoring sequences resulting in lower noise and more transcription factor binding sites resulting in higher noise. We propose a kinetic model of gene expression that takes into account the nonspecific DNA binding and one-dimensional sliding along the DNA, which occurs when transcription factors search for their target sites. We show that this assumption can improve the prediction of the mean-independent component of expression noise for our designed promoter sequences, suggesting that a transcription factor target search may affect gene expression noise. Consistent with our findings in designed promoters, we find that binding-site multiplicity in native promoters is associated with higher expression noise. Overall, our results demonstrate that small changes in promoter DNA sequence can tune noise levels in a manner that is predictable and partly decoupled from effects on the mean expression levels. These insights may assist in designing promoters with desired noise levels.


Asunto(s)
Biología Computacional/métodos , ADN/metabolismo , Expresión Génica , Regiones Promotoras Genéticas , Saccharomyces cerevisiae/genética , Sitios de Unión , Genes Fúngicos , Modelos Lineales , Datos de Secuencia Molecular , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo
17.
Bioinformatics ; 37(22): 4297, 2021 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-34695180
19.
Bioinformatics ; 32(17): i559-i566, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27587675

RESUMEN

MOTIVATION: Complex interactions among alleles often drive differences in inherited properties including disease predisposition. Isolating the effects of these interactions requires phasing information that is difficult to measure or infer. Furthermore, prevalent sequencing technologies used in the essential first step of determining a haplotype limit the range of that step to the span of reads, namely hundreds of bases. With the advent of pseudo-long read technologies, observable partial haplotypes can span several orders of magnitude more. Yet, measuring whole-genome-single-individual haplotypes remains a challenge. A different view of whole genome measurement addresses the 3D structure of the genome-with great development of Hi-C techniques in recent years. A shortcoming of current Hi-C, however, is the difficulty in inferring information that is specific to each of a pair of homologous chromosomes. RESULTS: In this work, we develop a robust algorithmic framework that takes two measurement derived datasets: raw Hi-C and partial short-range haplotypes, and constructs the full-genome haplotype as well as phased diploid Hi-C maps. By analyzing both data sets together we thus bridge important gaps in both technologies-from short to long haplotypes and from un-phased to phased Hi-C. We demonstrate that our method can recover ground truth haplotypes with high accuracy, using measured biological data as well as simulated data. We analyze the impact of noise, Hi-C sequencing depth and measured haplotype lengths on performance. Finally, we use the inferred 3D structure of a human genome to point at transcription factor targets nuclear co-localization. AVAILABILITY AND IMPLEMENTATION: The implementation available at https://github.com/YakhiniGroup/SpectraPh CONTACT: zohar.yakhini@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Cromosomas , Genoma Humano , Haplotipos , Conformación Molecular , Algoritmos , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos
20.
Bioinformatics ; 32(17): i464-i472, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27587663

RESUMEN

MOTIVATION: It is often the case in biological measurement data that results are given as a ranked list of quantities-for example, differential expression (DE) of genes as inferred from microarrays or RNA-seq. Recent years brought considerable progress in statistical tools for enrichment analysis in ranked lists. Several tools are now available that allow users to break the fixed set paradigm in assessing statistical enrichment of sets of genes. Continuing with the example, these tools identify factors that may be associated with measured differential expression. A drawback of existing tools is their focus on identifying single factors associated with the observed or measured ranks, failing to address relationships between these factors. For example, a scenario in which genes targeted by multiple miRNAs play a central role in the DE signal but the effect of each single miRNA is too subtle to be detected, as shown in our results. RESULTS: We propose statistical and algorithmic approaches for selecting a sub-collection of factors that can be aggregated into one ranked list that is heuristically most associated with an input ranked list (pivot). We examine performance on simulated data and apply our approach to cancer datasets. We find small sub-collections of miRNA that are statistically associated with gene DE in several types of cancer, suggesting miRNA cooperativity in driving disease related processes. Many of our findings are consistent with known roles of miRNAs in cancer, while others suggest previously unknown roles for certain miRNAs. AVAILABILITY AND IMPLEMENTATION: Code and instructions for our algorithmic framework, MULSEA, are in: https://github.com/YakhiniGroup/MULSEAContact:dalia.cohn@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Regulación de la Expresión Génica , MicroARNs , Modelos Estadísticos , Biología Computacional/métodos , Análisis Factorial , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Neoplasias
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA