RESUMO
The globin genes are archetypal tissue-specific genes that are silent in most tissues but for late-stage erythroblasts upon terminal erythroid differentiation. The transcriptional activation of the ß-globin gene is under the control of proximal and distal regulatory elements located on chromosome 11p15.4, including the ß-globin locus control region (LCR). The incorporation of selected LCR elements in lentiviral vectors encoding ß and ß-like globin genes has enabled successful genetic treatment of the ß-thalassemias and sickle cell disease. However, recent occurrences of benign clonal expansions in thalassemic patients and myelodysplastic syndrome in patients with sickle cell disease call attention to the non-erythroid functions of these powerful vectors. Here we demonstrate that lentivirally encoded LCR elements, in particular HS1 and HS2, can be activated in early hematopoietic cells including hematopoietic stem cells and myeloid progenitors. This activity is position-dependent and results in the transcriptional activation of a nearby reporter gene in these progenitor cell populations. We further show that flanking a globin vector with an insulator can effectively restrain this non-erythroid activity without impairing therapeutic globin expression. Globin lentiviral vectors harboring powerful LCR HS elements may thus expose to the risk of trans-activating cancer-related genes, which can be mitigated by a suitable insulator.
Assuntos
Anemia Falciforme , Globinas , Anemia Falciforme/genética , Terapia Genética/métodos , Vetores Genéticos/genética , Globinas/genética , Células-Tronco Hematopoéticas/metabolismo , Humanos , Globinas beta/genética , Globinas beta/metabolismoRESUMO
The decoding of transcription factor (TF) binding signals in genomic DNA is a fundamental problem. Here we present a prediction model called BindSpace that learns to embed DNA sequences and TF labels into the same space. By training on binding data from hundreds of TFs and embedding over 1 M DNA sequences, BindSpace achieves state-of-the-art multiclass binding prediction performance, in vitro and in vivo, and can distinguish between signals of closely related TFs.
Assuntos
Algoritmos , Biologia Computacional/métodos , DNA/metabolismo , Aprendizado de Máquina , Fatores de Transcrição/metabolismo , Sítios de Ligação , Imunoprecipitação da Cromatina , DNA/química , Humanos , Ligação ProteicaRESUMO
About one-fifth of the genes in the budding yeast are essential for haploid viability and cannot be functionally assessed using standard genetic approaches such as gene deletion. To facilitate genetic analysis of essential genes, we and others have assembled collections of yeast strains expressing temperature-sensitive (ts) alleles of essential genes. To explore the phenotypes caused by essential gene mutation we used a panel of genetically engineered fluorescent markers to explore the morphology of cells in the ts strain collection using high-throughput microscopy. Here, we describe the design and implementation of an online database, PhenoM (Phenomics of yeast Mutants), for storing, retrieving, visualizing and data mining the quantitative single-cell measurements extracted from micrographs of the ts mutant cells. PhenoM allows users to rapidly search and retrieve raw images and their quantified morphological data for genes of interest. The database also provides several data-mining tools, including a PhenoBlast module for phenotypic comparison between mutant strains and a Gene Ontology module for functional enrichment analysis of gene sets showing similar morphological alterations. The current PhenoM version 1.0 contains 78,194 morphological images and 1,909,914 cells covering six subcellular compartments or structures for 775 ts alleles spanning 491 essential genes. PhenoM is freely available at http://phenom.ccbr.utoronto.ca/.
Assuntos
Bases de Dados Genéticas , Genes Essenciais , Genes Fúngicos , Mutação , Fenótipo , Saccharomyces cerevisiae/genética , Mineração de Dados , Saccharomyces cerevisiae/citologiaRESUMO
Human cancers arise through the sequential acquisition of somatic mutations that create successive clonal populations. Human cancer evolution models could help illuminate this process and inform therapeutic intervention at an early disease stage, but their creation has faced significant challenges. Here, we combined induced pluripotent stem cell (iPSC) and CRISPR-Cas9 technologies to develop a model of the clonal evolution of acute myeloid leukemia (AML). Through the stepwise introduction of three driver mutations, we generated iPSC lines that, upon hematopoietic differentiation, capture distinct premalignant stages, including clonal hematopoiesis (CH) and myelodysplastic syndrome (MDS), culminating in a transplantable leukemia, and recapitulate transcriptional and chromatin accessibility signatures of primary human MDS and AML. By mapping dynamic changes in transcriptomes and chromatin landscapes, we characterize transcriptional programs driving specific transitions between disease stages. We identify cell-autonomous dysregulation of inflammatory signaling as an early and persistent event in leukemogenesis and a promising early therapeutic target.
Assuntos
Células-Tronco Pluripotentes Induzidas , Leucemia Mieloide Aguda , Evolução Clonal/genética , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Edição de Genes , Humanos , Leucemia Mieloide Aguda/genética , MutaçãoRESUMO
MOTIVATION: Fluorescence imaging has become a commonplace for quantitatively measuring mRNA or protein expression in cells and tissues. However, such expression data are usually relative-absolute concentrations or molecular copy numbers are typically not known. While this is satisfactory for many applications, for certain kinds of quantitative network modeling and analysis of expression noise, absolute measures of expression are necessary. RESULTS: We propose two methods for estimating molecular copy numbers from single uncalibrated expression images of tissues. These methods rely on expression variability between cells, due either to steady-state fluctuations or unequal distribution of molecules during cell division, to make their estimates. We apply these methods to 152 protein fluorescence expression images of Drosophila melanogaster embryos during early development, generating copy number estimates for 14 genes in the segmentation network. We also analyze the effects of noise on our estimators and compare with empirical findings. Finally, we confirm an observation of Bar-Even et al., made in the much different setting of Saccharomyces cerevisiae, that steady-state expression variance tends to scale with mean expression. AVAILABILITY: The data are all drawn from FlyEx (explained within), and is available at http://flyex.ams.sunysb.edu/FlyEx/.
Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Proteínas/química , Proteínas/genética , Animais , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Fluorescência , RNA Mensageiro/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismoRESUMO
We present GuideScan software for the design of CRISPR guide RNA libraries that can be used to edit coding and noncoding genomic regions. GuideScan produces high-density sets of guide RNAs (gRNAs) for single- and paired-gRNA genome-wide screens. We also show that the trie data structure of GuideScan enables the design of gRNAs that are more specific than those designed by existing tools.
Assuntos
Algoritmos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Inativação Gênica , Aprendizado de Máquina , RNA Interferente Pequeno/genética , Software , Sistemas CRISPR-Cas/genética , Mapeamento Cromossômico/métodos , Análise de Sequência de RNA/métodosRESUMO
Here we present HiC-DC, a principled method to estimate the statistical significance (P values) of chromatin interactions from Hi-C experiments. HiC-DC uses hurdle negative binomial regression account for systematic sources of variation in Hi-C read counts-for example, distance-dependent random polymer ligation and GC content and mappability bias-and model zero inflation and overdispersion. Applied to high-resolution Hi-C data in a lymphoblastoid cell line, HiC-DC detects significant interactions at the sub-topologically associating domain level, identifying potential structural and regulatory interactions supported by CTCF binding sites, DNase accessibility, and/or active histone marks. CTCF-associated interactions are most strongly enriched in the middle genomic distance range (â¼700 kb-1.5 Mb), while interactions involving actively marked DNase accessible elements are enriched both at short (<500 kb) and longer (>1.5 Mb) genomic distances. There is a striking enrichment of longer-range interactions connecting replication-dependent histone genes on chromosome 6, potentially representing the chromatin architecture at the histone locus body.
Assuntos
Cromatina/metabolismo , Biologia Computacional/métodos , Genoma/genética , Genômica/métodos , Modelos Genéticos , Animais , Sítios de Ligação/genética , Linhagem Celular Tumoral , Cromatina/genética , Mapeamento Cromossômico/métodos , Cromossomos Humanos Par 6/genética , Cromossomos Humanos Par 6/metabolismo , Ilhas de CpG/genética , Conjuntos de Dados como Assunto , Código das Histonas/genética , Humanos , Camundongos , Regiões Promotoras Genéticas/genética , SoftwareRESUMO
A significant challenge of functional genomics is to develop methods for genome-scale acquisition and analysis of cell biological data. Here, we present an integrated method that combines genome-wide genetic perturbation of Saccharomyces cerevisiae with high-content screening to facilitate the genetic description of sub-cellular structures and compartment morphology. As proof of principle, we used a Rad52-GFP marker to examine DNA damage foci in â¼20 million single cells from â¼5,000 different mutant backgrounds in the context of selected genetic or chemical perturbations. Phenotypes were classified using a machine learning-based automated image analysis pipeline. 345 mutants were identified that had elevated numbers of DNA damage foci, almost half of which were identified only in sensitized backgrounds. Subsequent analysis of Vid22, a protein implicated in the DNA damage response, revealed that it acts together with the Sgs1 helicase at sites of DNA damage and preferentially binds G-quadruplex regions of the genome. This approach is extensible to numerous other cell biological markers and experimental systems.