RESUMEN
Due mainly to large genome size and prevalence of repetitive sequences in the nuclear genome of spruce (Picea Mill.), it is very difficult to develop single-copy genomic microsatellite markers. We have developed and characterized 25 polymorphic, single-copy genic microsatellites from white spruce (Picea glauca (Moench) Voss) EST sequences and determined their informativeness in white spruce and black spruce (Picea mariana (Mill.) B.S.P.) and inheritance in black spruce. White spruce EST sequences from NCBI dbEST were searched for the presence of microsatellite repeats. Forty-seven sequences containing dinucleotide, trinucleotide, tetranucleotide and compound repeats were selected to develop primers. Twenty-five of the designed primer pairs yielded scorable amplicons, with single-locus patterns, and were characterized in 20 individuals each of white spruce and black spruce. All 25 microsatellites were polymorphic in white spruce and 24 in black spruce. The number of alleles at a locus ranged from two to 18, with a mean of 8.8 in white spruce, and from one to 17, with a mean of 7.6 in black spruce. The expected heterozygosity/polymorphic information content ranged from 0.10 to 0.92, with a mean of 0.67 in white spruce, and from 0 to 0.93, with a mean of 0.59 in black spruce. Microsatellites with dinucleotide and compound repeats were more informative than those with trinucleotide and tetranucleotide repeats. Eighteen microsatellite markers polymorphic between the parents of a black spruce controlled cross inherited in a single-locus Mendelian fashion. The microsatellite markers developed can be applied for various genetics, genomics, breeding, and conservation studies and applications.
Asunto(s)
ADN de Plantas/genética , Etiquetas de Secuencia Expresada/metabolismo , Dosificación de Gen , Repeticiones de Microsatélite/genética , Picea/genética , Distribución de Chi-Cuadrado , Genotipo , Patrón de Herencia/genética , Motivos de Nucleótidos/genéticaRESUMEN
Here we develop a high-throughput single-cell ATAC-seq (assay for transposition of accessible chromatin) method to measure physical access to DNA in whole cells. Our approach integrates fluorescence imaging and addressable reagent deposition across a massively parallel (5184) nano-well array, yielding a nearly 20-fold improvement in throughput (up to ~1800 cells/chip, 4-5 h on-chip processing time) and library preparation cost (~81¢ per cell) compared to prior microfluidic implementations. We apply this method to measure regulatory variation in peripheral blood mononuclear cells (PBMCs) and show robust, de novo clustering of single cells by hematopoietic cell type.
Asunto(s)
Ensamble y Desensamble de Cromatina , Ensayos Analíticos de Alto Rendimiento , Imagen Óptica/métodos , Análisis de la Célula Individual/métodos , Animales , Línea Celular , Epigénesis Genética , Humanos , RatonesRESUMEN
BACKGROUND: Technological advances have enabled transcriptome characterization of cell types at the single-cell level providing new biological insights. New methods that enable simple yet high-throughput single-cell expression profiling are highly desirable. RESULTS: Here we report a novel nanowell-based single-cell RNA sequencing system, ICELL8, which enables processing of thousands of cells per sample. The system employs a 5,184-nanowell-containing microchip to capture ~1,300 single cells and process them. Each nanowell contains preprinted oligonucleotides encoding poly-d(T), a unique well barcode, and a unique molecular identifier. The ICELL8 system uses imaging software to identify nanowells containing viable single cells and only wells with single cells are processed into sequencing libraries. Here, we report the performance and utility of ICELL8 using samples of increasing complexity from cultured cells to mouse solid tissue samples. Our assessment of the system to discriminate between mixed human and mouse cells showed that ICELL8 has a low cell multiplet rate (< 3%) and low cross-cell contamination. We characterized single-cell transcriptomes of more than a thousand cultured human and mouse cells as well as 468 mouse pancreatic islets cells. We were able to identify distinct cell types in pancreatic islets, including alpha, beta, delta and gamma cells. CONCLUSIONS: Overall, ICELL8 provides efficient and cost-effective single-cell expression profiling of thousands of cells, allowing researchers to decipher single-cell transcriptomes within complex biological samples.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Nanotecnología/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Análisis de Matrices Tisulares/métodos , Línea Celular , Humanos , Islotes Pancreáticos/citología , Islotes Pancreáticos/metabolismoRESUMEN
To achieve proper spatiotemporal control of gene expression, transcription factors cooperatively assemble onto specific DNA sequences. The ETS domain protein monomer of GABPα and the B-ZIP domain protein dimer of CREB1 cooperatively bind DNA only when the ETS ((C)/GCGGAA GT: ) and CRE ( GT: GACGTCAC) motifs overlap precisely, producing the ETSâCRE motif ((C)/GCGGAA GT: GACGTCAC). We designed a Protein Binding Microarray (PBM) with 60-bp DNAs containing four identical sectors, each with 177,440 features that explore the cooperative interactions between GABPα and CREB1 upon binding the ETSâCRE motif. The DNA sequences include all 15-mers of the form (C)/GCGGA--CG-, the ETSâCRE motif, and all single nucleotide polymorphisms (SNPs), and occurrences in the human and mouse genomes. CREB1 enhanced GABPα binding to the canonical ETSâCRE motif CCGGAAGT two-fold, and up to 23-fold for several SNPs at the beginning and end of the ETS motif, which is suggestive of two separate and distinct allosteric mechanisms of cooperative binding. We show that the ETS-CRE array data can be used to identify regions likely cooperatively bound by GABPα and CREB1 in vivo, and demonstrate their ability to identify human genetic variants that might inhibit cooperative binding.
Asunto(s)
Sitios de Unión , Proteína de Unión a Elemento de Respuesta al AMP Cíclico/metabolismo , Factor de Transcripción de la Proteína de Unión a GA/metabolismo , Motivos de Nucleótidos , Proteínas Proto-Oncogénicas c-ets/metabolismo , Animales , Línea Celular , Sitios Genéticos , Humanos , Ratones , Análisis de Secuencia por Matrices de Oligonucleótidos , Polimorfismo de Nucleótido Simple , Unión Proteica , Proteínas Recombinantes de Fusión/metabolismoRESUMEN
Transcription factor (TF) DNA sequence preferences direct their regulatory activity, but are currently known for only â¼1% of eukaryotic TFs. Broadly sampling DNA-binding domain (DBD) types from multiple eukaryotic clades, we determined DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. We find that closely related DBDs almost always have very similar DNA sequence preferences, enabling inference of motifs for â¼34% of the â¼170,000 known or predicted eukaryotic TFs. Sequences matching both measured and inferred motifs are enriched in chromatin immunoprecipitation sequencing (ChIP-seq) peaks and upstream of transcription start sites in diverse eukaryotic lineages. SNPs defining expression quantitative trait loci in Arabidopsis promoters are also enriched for predicted TF binding sites. Importantly, our motif "library" can be used to identify specific TFs whose binding may be altered by human disease risk alleles. These data present a powerful resource for mapping transcriptional networks across eukaryotes.
Asunto(s)
Arabidopsis/genética , Motivos de Nucleótidos , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo , Arabidopsis/metabolismo , Inmunoprecipitación de Cromatina , Humanos , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas , Unión Proteica , Sitios de Carácter CuantitativoRESUMEN
Three oxidative products of 5-methylcytosine (5mC) occur in mammalian genomes. We evaluated if these cytosine modifications in a CG dinucleotide altered DNA binding of four B-HLH homodimers and three heterodimers to the E-Box motif CGCAG|GTG. We examined 25 DNA probes containing all combinations of cytosine in a CG dinucleotide and none changed binding except for carboxylation of cytosine (5caC) in the strand CGCAG|GTG. 5caC enhanced binding of all examined B-HLH homodimers and heterodimers, particularly the Tcf3|Ascl1 heterodimer which increased binding ~10-fold. These results highlight a potential function of the oxidative products of 5mC, changing the DNA binding of sequence-specific transcription factors.
Asunto(s)
Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/química , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , Citosina/análogos & derivados , 5-Metilcitosina/química , 5-Metilcitosina/metabolismo , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Dicroismo Circular , Citosina/química , Citosina/metabolismo , Fosfatos de Dinucleósidos/química , Fosfatos de Dinucleósidos/metabolismo , Elementos E-Box , Humanos , Modelos Moleculares , Datos de Secuencia Molecular , Unión Proteica , Multimerización de ProteínaRESUMEN
BACKGROUND: EST (expressed sequence tag) sequences and their annotation provide a highly valuable resource for gene discovery, genome sequence annotation, and other genomics studies that can be applied in genetics, breeding and conservation programs for non-model organisms. Conifers are long-lived plants that are ecologically and economically important globally, and have a large genome size. Black spruce (Picea mariana), is a transcontinental species of the North American boreal and temperate forests. However, there are limited transcriptomic and genomic resources for this species. The primary objective of our study was to develop a black spruce transcriptomic resource to facilitate on-going functional genomics projects related to growth and adaptation to climate change. RESULTS: We conducted bidirectional sequencing of cDNA clones from a standard cDNA library constructed from black spruce needle tissues. We obtained 4,594 high quality (2,455 5' end and 2,139 3' end) sequence reads, with an average read-length of 532 bp. Clustering and assembly of ESTs resulted in 2,731 unique sequences, consisting of 2,234 singletons and 497 contigs. Approximately two-thirds (63%) of unique sequences were functionally annotated. Genes involved in 36 molecular functions and 90 biological processes were discovered, including 24 putative transcription factors and 232 genes involved in photosynthesis. Most abundantly expressed transcripts were associated with photosynthesis, growth factors, stress and disease response, and transcription factors. A total of 216 full-length genes were identified. About 18% (493) of the transcripts were novel, representing an important addition to the Genbank EST database (dbEST). Fifty-seven di-, tri-, tetra- and penta-nucleotide simple sequence repeats were identified. CONCLUSIONS: We have developed the first high quality EST resource for black spruce and identified 493 novel transcripts, which may be species-specific related to life history and ecological traits. We have also identified full-length genes and microsatellite-containing ESTs. Based on EST sequence similarities, black spruce showed close evolutionary relationships with congeneric Picea glauca and Picea sitchensis compared to other Pinaceae members and angiosperms. The EST sequences reported here provide an important resource for genome annotation, functional and comparative genomics, molecular breeding, conservation and management studies and applications in black spruce and related conifer species.
Asunto(s)
Etiquetas de Secuencia Expresada/metabolismo , Genómica , Anotación de Secuencia Molecular/métodos , Picea/genética , Secuencia de Bases , Secuencia Conservada/genética , Mapeo Contig , ADN Complementario/genética , Bases de Datos de Proteínas , Evolución Molecular , Regulación de la Expresión Génica de las Plantas , Ontología de Genes , Genes de Plantas/genética , Estudios de Asociación Genética , Datos de Secuencia Molecular , Familia de Multigenes/genética , Péptidos/genética , Pinus/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Homología de Secuencia de Ácido NucleicoRESUMEN
To evaluate the effect of CG methylation on DNA binding of sequence-specific B-ZIP transcription factors (TFs) in a high-throughput manner, we enzymatically methylated the cytosine in the CG dinucleotide on protein binding microarrays. Two Agilent DNA array designs were used. One contained 40,000 features using de Bruijn sequences where each 8-mer occurs 32 times in various positions in the DNA sequence. The second contained 180,000 features with each CG containing 8-mer occurring three times. The first design was better for identification of binding motifs, while the second was better for quantification. Using this novel technology, we show that CG methylation enhanced binding for CEBPA and CEBPB and inhibited binding for CREB, ATF4, JUN, JUND, CEBPD, and CEBPG. The CEBPB|ATF4 heterodimer bound a novel motif CGAT|GCAA 10-fold better when methylated. The electrophoretic mobility shift assay (EMSA) confirmed these results. CEBPB ChIP-seq data using primary female mouse dermal fibroblasts with 50× methylome coverage for each strand indicate that the methylated sequences well-bound on the arrays are also bound in vivo. CEBPB bound 39% of the methylated canonical 10-mers ATTGC|GCAAT in the mouse genome. After ATF4 protein induction by thapsigargin which results in ER stress, CEBPB binds methylated CGAT|GCAA in vivo, recapitulating what was observed on the arrays. This methodology can be used to identify new methylated DNA sequences preferentially bound by TFs, which may be functional in vivo.
Asunto(s)
Factor de Transcripción Activador 4/metabolismo , Proteína beta Potenciadora de Unión a CCAAT/metabolismo , Islas de CpG , Metilación de ADN , Factor de Transcripción Activador 4/química , Animales , Secuencia de Bases , Sitios de Unión , Proteína beta Potenciadora de Unión a CCAAT/química , Femenino , Fibroblastos , Ratones , Motivos de Nucleótidos , Posición Específica de Matrices de Puntuación , Unión Proteica/efectos de los fármacos , Multimerización de Proteína , Tapsigargina/inmunología , Factores de Transcripción/metabolismoRESUMEN
Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X(4)-N(1-30)-X(4)) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETSâETS motif ((C/G)CCGGAAGCGGAA) and the ETSâCRE motif ((C/G)CGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETSâCRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETSâCRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETSâCRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETSâCRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETSâCRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif.
Asunto(s)
Proteína de Unión a Elemento de Respuesta al AMP Cíclico/metabolismo , Factor de Transcripción de la Proteína de Unión a GA/metabolismo , Motivos de Nucleótidos , Regiones Promotoras Genéticas , Proteínas Proto-Oncogénicas c-ets/genética , Animales , Secuencia de Bases , Sitios de Unión , Secuencia Conservada , Proteína de Unión a Elemento de Respuesta al AMP Cíclico/química , Metilación de ADN , Factor de Transcripción de la Proteína de Unión a GA/química , Humanos , Ratones , Simulación del Acoplamiento Molecular , Conformación de Ácido Nucleico , Unión Proteica , Conformación Proteica , Proteínas Proto-Oncogénicas c-ets/químicaRESUMEN
BACKGROUND: Genetic maps provide an important genomic resource for understanding genome organization and evolution, comparative genomics, mapping genes and quantitative trait loci, and associating genomic segments with phenotypic traits. Spruce (Picea) genomics work is quite challenging, mainly because of extremely large size and highly repetitive nature of its genome, unsequenced and poorly understood genome, and the general lack of advanced-generation pedigrees. Our goal was to construct a high-density genetic linkage map of black spruce (Picea mariana, 2n = 24), which is a predominant, transcontinental species of the North American boreal and temperate forests, with high ecological and economic importance. RESULTS: We have developed a near-saturated and complete genetic linkage map of black spruce using a three-generation outbred pedigree and amplified fragment length polymorphism (AFLP), selectively amplified microsatellite polymorphic loci (SAMPL), expressed sequence tag polymorphism (ESTP), and microsatellite (mostly cDNA based) markers. Maternal, paternal, and consensus genetic linkage maps were constructed. The maternal, paternal, and consensus maps in our study consistently coalesced into 12 linkage groups, corresponding to the haploid chromosome number (1n = 1x = 12) of 12 in the genus Picea. The maternal map had 816 and the paternal map 743 markers distributed over 12 linkage groups each. The consensus map consisted of 1,111 markers distributed over 12 linkage groups, and covered almost the entire (> 97%) black spruce genome. The mapped markers included 809 AFLPs, 255 SAMPL, 42 microsatellites, and 5 ESTPs. Total estimated length of the genetic map was 1,770 cM, with an average of one marker every 1.6 cM. The maternal, paternal and consensus genetic maps aligned almost perfectly. CONCLUSION: We have constructed the first high density to near-saturated genetic linkage map of black spruce, with greater than 97% genome coverage. Also, this is the first genetic map based on a three-generation outbred pedigree in the genus Picea. The genome length in P. mariana is likely to be about 1,800 cM. The genetic maps developed in our study can serve as a reference map for various genomics studies and applications in Picea and Pinaceae.