RESUMO
To fully understand animal transcription networks, it is essential to accurately measure the spatial and temporal expression patterns of transcription factors and their targets. We describe a registration technique that takes image-based data from hundreds of Drosophila blastoderm embryos, each costained for a reference gene and one of a set of genes of interest, and builds a model VirtualEmbryo. This model captures in a common framework the average expression patterns for many genes in spite of significant variation in morphology and expression between individual embryos. We establish the method's accuracy by showing that relationships between a pair of genes' expression inferred from the model are nearly identical to those measured in embryos costained for the pair. We present a VirtualEmbryo containing data for 95 genes at six time cohorts. We show that known gene-regulatory interactions can be automatically recovered from this data set and predict hundreds of new interactions.
Assuntos
Drosophila melanogaster/genética , Redes Reguladoras de Genes , Modelos Genéticos , Animais , Blastoderma , Drosophila melanogaster/metabolismo , Embrião não Mamífero/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Genes de InsetosRESUMO
Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.
Assuntos
Proteínas de Drosophila , Embrião não Mamífero/embriologia , Desenvolvimento Embrionário/fisiologia , Elementos Facilitadores Genéticos/fisiologia , Análise de Sequência de DNA , Fatores de Transcrição , Animais , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster , Estudo de Associação Genômica Ampla , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
Translation rate per mRNA molecule correlates positively with mRNA abundance. As a result, protein levels do not scale linearly with mRNA levels, but instead scale with the abundance of mRNA raised to the power of an 'amplification exponent'. Here we show that to quantitate translational control, the translation rate must be decomposed into two components. One, TRmD, depends on the mRNA level and defines the amplification exponent. The other, TRmIND, is independent of mRNA amount and impacts the correlation coefficient between protein and mRNA levels. We show that in Saccharomyces cerevisiae TRmD represents â¼20% of the variance in translation and directs an amplification exponent of 1.20 with a 95% confidence interval [1.14, 1.26]. TRmIND constitutes the remaining â¼80% of the variance in translation and explains â¼5% of the variance in protein expression. We also find that TRmD and TRmIND are preferentially determined by different mRNA sequence features: TRmIND by the length of the open reading frame and TRmD both by a â¼60 nucleotide element that spans the initiating AUG and by codon and amino acid frequency. Our work provides more appropriate estimates of translational control and implies that TRmIND is under different evolutionary selective pressures than TRmD.
Assuntos
Regulação Fúngica da Expressão Gênica , Biossíntese de Proteínas/genética , RNA Mensageiro/genética , Saccharomyces cerevisiae/genética , Algoritmos , Sequência de Bases , Códon/genética , Códon de Iniciação/genética , Modelos Genéticos , Fases de Leitura Aberta/genética , Iniciação Traducional da Cadeia Peptídica/genética , RNA Mensageiro/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMO
Identifying protein-protein interactions (PPIs) at an acceptable false discovery rate (FDR) is challenging. Previously we identified several hundred PPIs from affinity purification - mass spectrometry (AP-MS) data for the bacteria Escherichia coli and Desulfovibrio vulgaris These two interactomes have lower FDRs than any of the nine interactomes proposed previously for bacteria and are more enriched in PPIs validated by other data than the nine earlier interactomes. To more thoroughly determine the accuracy of ours or other interactomes and to discover further PPIs de novo, here we present a quantitative tagless method that employs iTRAQ MS to measure the copurification of endogenous proteins through orthogonal chromatography steps. 5273 fractions from a four-step fractionation of a D. vulgaris protein extract were assayed, resulting in the detection of 1242 proteins. Protein partners from our D. vulgaris and E. coli AP-MS interactomes copurify as frequently as pairs belonging to three benchmark data sets of well-characterized PPIs. In contrast, the protein pairs from the nine other bacterial interactomes copurify two- to 20-fold less often. We also identify 200 high confidence D. vulgaris PPIs based on tagless copurification and colocalization in the genome. These PPIs are as strongly validated by other data as our AP-MS interactomes and overlap with our AP-MS interactome for D.vulgaris within 3% of expectation, once FDRs and false negative rates are taken into account. Finally, we reanalyzed data from two quantitative tagless screens of human cell extracts. We estimate that the novel PPIs reported in these studies have an FDR of at least 85% and find that less than 7% of the novel PPIs identified in each screen overlap. Our results establish that a quantitative tagless method can be used to validate and identify PPIs, but that such data must be analyzed carefully to minimize the FDR.
Assuntos
Proteínas de Bactérias/metabolismo , Desulfovibrio vulgaris/metabolismo , Escherichia coli/metabolismo , Proteômica/métodos , Cromatografia de Afinidade/métodos , Espectrometria de Massas/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de ProteínasRESUMO
Numerous affinity purification-mass spectrometry (AP-MS) and yeast two-hybrid screens have each defined thousands of pairwise protein-protein interactions (PPIs), most of which are between functionally unrelated proteins. The accuracy of these networks, however, is under debate. Here, we present an AP-MS survey of the bacterium Desulfovibrio vulgaris together with a critical reanalysis of nine published bacterial yeast two-hybrid and AP-MS screens. We have identified 459 high confidence PPIs from D. vulgaris and 391 from Escherichia coli Compared with the nine published interactomes, our two networks are smaller, are much less highly connected, and have significantly lower false discovery rates. In addition, our interactomes are much more enriched in protein pairs that are encoded in the same operon, have similar functions, and are reproducibly detected in other physical interaction assays than the pairs reported in prior studies. Our work establishes more stringent benchmarks for the properties of protein interactomes and suggests that bona fide PPIs much more frequently involve protein partners that are annotated with similar functions or that can be validated in independent assays than earlier studies suggested.
Assuntos
Proteínas de Bactérias/metabolismo , Biologia Computacional/métodos , Desulfovibrio vulgaris/metabolismo , Escherichia coli/metabolismo , Cromatografia de Afinidade , Bases de Dados de Proteínas , Espectrometria de Massas , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteômica/métodos , Técnicas do Sistema de Duplo-HíbridoRESUMO
In animals, each sequence-specific transcription factor typically binds to thousands of genomic regions in vivo. Our previous studies of 20 transcription factors show that most genomic regions bound at high levels in Drosophila blastoderm embryos are known or probable functional targets, but genomic regions occupied only at low levels have characteristics suggesting that most are not involved in the cis-regulation of transcription. Here we use transgenic reporter gene assays to directly test the transcriptional activity of 104 genomic regions bound at different levels by the 20 transcription factors. Fifteen genomic regions were selected based solely on the DNA occupancy level of the transcription factor Kruppel. Five of the six most highly bound regions drive blastoderm patterns of reporter transcription. In contrast, only one of the nine lowly bound regions drives transcription at this stage and four of them are not detectably active at any stage of embryogenesis. A larger set of 89 genomic regions chosen using criteria designed to identify functional cis-regulatory regions supports the same trend: genomic regions occupied at high levels by transcription factors in vivo drive patterned gene expression, whereas those occupied only at lower levels mostly do not. These results support studies that indicate that the high cellular concentrations of sequence-specific transcription factors drive extensive, low-occupancy, nonfunctional interactions within the accessible portions of the genome.
Assuntos
DNA/metabolismo , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Genes Reporter/genética , Fatores de Transcrição/metabolismo , Animais , Animais Geneticamente Modificados , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Embrião não Mamífero/metabolismo , Feminino , Genoma de Inseto/genética , Fatores de Transcrição Kruppel-Like/metabolismo , Masculino , Ligação Proteica/genéticaRESUMO
Transcription factors that drive complex patterns of gene expression during animal development bind to thousands of genomic regions, with quantitative differences in binding across bound regions mediating their activity. While we now have tools to characterize the DNA affinities of these proteins and to precisely measure their genome-wide distribution in vivo, our understanding of the forces that determine where, when, and to what extent they bind remains primitive. Here we use a thermodynamic model of transcription factor binding to evaluate the contribution of different biophysical forces to the binding of five regulators of early embryonic anterior-posterior patterning in Drosophila melanogaster. Predictions based on DNA sequence and in vitro protein-DNA affinities alone achieve a correlation of â¼0.4 with experimental measurements of in vivo binding. Incorporating cooperativity and competition among the five factors, and accounting for spatial patterning by modeling binding in every nucleus independently, had little effect on prediction accuracy. A major source of error was the prediction of binding events that do not occur in vivo, which we hypothesized reflected reduced accessibility of chromatin. To test this, we incorporated experimental measurements of genome-wide DNA accessibility into our model, effectively restricting predicted binding to regions of open chromatin. This dramatically improved our predictions to a correlation of 0.6-0.9 for various factors across known target genes. Finally, we used our model to quantify the roles of DNA sequence, accessibility, and binding competition and cooperativity. Our results show that, in regions of open chromatin, binding can be predicted almost exclusively by the sequence specificity of individual factors, with a minimal role for protein interactions. We suggest that a combination of experimentally determined chromatin accessibility data and simple computational models of transcription factor binding may be used to predict the binding landscape of any animal transcription factor with significant precision.
Assuntos
Proteínas de Ligação a DNA/metabolismo , Drosophila melanogaster/crescimento & desenvolvimento , Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Fatores de Transcrição/metabolismo , Animais , Sequência de Bases , Sítios de Ligação/genética , Cromatina/química , Cromatina/genética , Biologia Computacional , Proteínas de Ligação a DNA/genética , Genoma de Inseto , Modelos Genéticos , Fatores de Transcrição/genéticaRESUMO
Differences in the level, timing, or location of gene expression can contribute to alternative phenotypes at the molecular and organismal level. Understanding the origins of expression differences is complicated by the fact that organismal morphology and gene regulatory networks could potentially vary even between closely related species. To assess the scope of such changes, we used high-resolution imaging methods to measure mRNA expression in blastoderm embryos of Drosophila yakuba and Drosophila pseudoobscura and assembled these data into cellular resolution atlases, where expression levels for 13 genes in the segmentation network are averaged into species-specific, cellular resolution morphological frameworks. We demonstrate that the blastoderm embryos of these species differ in their morphology in terms of size, shape, and number of nuclei. We present an approach to compare cellular gene expression patterns between species, while accounting for varying embryo morphology, and apply it to our data and an equivalent dataset for Drosophila melanogaster. Our analysis reveals that all individual genes differ quantitatively in their spatio-temporal expression patterns between these species, primarily in terms of their relative position and dynamics. Despite many small quantitative differences, cellular gene expression profiles for the whole set of genes examined are largely similar. This suggests that cell types at this stage of development are conserved, though they can differ in their relative position by up to 3-4 cell widths and in their relative proportion between species by as much as 5-fold. Quantitative differences in the dynamics and relative level of a subset of genes between corresponding cell types may reflect altered regulatory functions between species. Our results emphasize that transcriptional networks can diverge over short evolutionary timescales and that even small changes can lead to distinct output in terms of the placement and number of equivalent cells.
Assuntos
Padronização Corporal/genética , Proteínas de Drosophila/metabolismo , Drosophila/embriologia , Drosophila/genética , Animais , Evolução Biológica , Blastoderma/crescimento & desenvolvimento , Proteínas de Drosophila/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes/genética , Hibridização in Situ Fluorescente , Especificidade da EspécieRESUMO
Changes in gene expression play an important role in evolution, yet the molecular mechanisms underlying regulatory evolution are poorly understood. Here we compare genome-wide binding of the six transcription factors that initiate segmentation along the anterior-posterior axis in embryos of two closely related species: Drosophila melanogaster and Drosophila yakuba. Where we observe binding by a factor in one species, we almost always observe binding by that factor to the orthologous sequence in the other species. Levels of binding, however, vary considerably. The magnitude and direction of the interspecies differences in binding levels of all six factors are strongly correlated, suggesting a role for chromatin or other factor-independent forces in mediating the divergence of transcription factor binding. Nonetheless, factor-specific quantitative variation in binding is common, and we show that it is driven to a large extent by the gain and loss of cognate recognition sequences for the given factor. We find only a weak correlation between binding variation and regulatory function. These data provide the first genome-wide picture of how modest levels of sequence divergence between highly morphologically similar species affect a system of coordinately acting transcription factors during animal development, and highlight the dominant role of quantitative variation in transcription factor binding over short evolutionary distances.
Assuntos
Evolução Biológica , Drosophila/genética , Drosophila/metabolismo , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , DNA/genética , DNA/metabolismo , Drosophila/embriologia , Genes de Insetos , Genoma de Inseto , Análise de Componente Principal , Ligação Proteica , Fatores de Transcrição/genéticaRESUMO
Cell membranes represent the "front line" of cellular defense and the interface between a cell and its environment. To determine the range of proteins and protein complexes that are present in the cell membranes of a target organism, we have utilized a "tagless" process for the system-wide isolation and identification of native membrane protein complexes. As an initial subject for study, we have chosen the Gram-negative sulfate-reducing bacterium Desulfovibrio vulgaris. With this tagless methodology, we have identified about two-thirds of the outer membrane- associated proteins anticipated. Approximately three-fourths of these appear to form homomeric complexes. Statistical and machine-learning methods used to analyze data compiled over multiple experiments revealed networks of additional protein-protein interactions providing insight into heteromeric contacts made between proteins across this region of the cell. Taken together, these results establish a D. vulgaris outer membrane protein data set that will be essential for the detection and characterization of environment-driven changes in the outer membrane proteome and in the modeling of stress response pathways. The workflow utilized here should be effective for the global characterization of membrane protein complexes in a wide range of organisms.
Assuntos
Proteínas da Membrana Bacteriana Externa/isolamento & purificação , Desulfovibrio vulgaris/química , Ensaios de Triagem em Larga Escala/métodos , Proteínas de Membrana/isolamento & purificação , Complexos Multiproteicos/isolamento & purificação , Proteínas da Membrana Bacteriana Externa/química , Membrana Celular/química , Cromatografia por Troca Iônica , Desulfovibrio vulgaris/enzimologia , Detergentes/química , Eletroforese em Gel de Poliacrilamida , Escherichia coli/química , Espectrometria de Massas , Proteínas de Membrana/química , Peso Molecular , Complexos Multiproteicos/química , Periplasma/química , Periplasma/enzimologia , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteoma/química , Proteômica/métodos , Homologia de Sequência de Aminoácidos , SolubilidadeRESUMO
An unbiased survey has been made of the stable, most abundant multi-protein complexes in Desulfovibrio vulgaris Hildenborough (DvH) that are larger than Mr approximately 400 k. The quaternary structures for 8 of the 16 complexes purified during this work were determined by single-particle reconstruction of negatively stained specimens, a success rate approximately 10 times greater than that of previous "proteomic" screens. In addition, the subunit compositions and stoichiometries of the remaining complexes were determined by biochemical methods. Our data show that the structures of only two of these large complexes, out of the 13 in this set that have recognizable functions, can be modeled with confidence based on the structures of known homologs. These results indicate that there is significantly greater variability in the way that homologous prokaryotic macromolecular complexes are assembled than has generally been appreciated. As a consequence, we suggest that relying solely on previously determined quaternary structures for homologous proteins may not be sufficient to properly understand their role in another cell of interest.
Assuntos
Proteínas de Bactérias/química , Desulfovibrio vulgaris/metabolismo , Proteínas de Bactérias/isolamento & purificação , Cristalografia por Raios X , Bases de Dados de Proteínas , Desulfovibrio vulgaris/química , Modelos Moleculares , Complexos Multiproteicos/química , Complexos Multiproteicos/metabolismo , Conformação ProteicaRESUMO
We have developed an information-dependent, iterative MS/MS acquisition (IMMA) tool for improving MS/MS efficiency, increasing proteome coverage, and shortening analysis time for high-throughput proteomics applications based on the LC-MALDI MS/MS platform. The underlying principle of IMMA is to limit MS/MS analyses to a subset of molecular ions that are likely to identify a maximum number of proteins. IMMA reduces redundancy of MS/MS analyses by excluding from the precursor ion peak lists proteotypic peptides derived from the already identified proteins and uses a retention time prediction algorithm to limit the degree of false exclusions. It also increases the utilization rate of MS/MS spectra by removing "low value" unidentifiable targets like nonpeptides and peptides carrying large loads of modifications, which are flagged by their "nonpeptide" excess-to-nominal mass ratios. For some samples, IMMA increases the number of identified proteins by â¼20-40% when compared to the data dependent methods. IMMA terminates an MS/MS run at the operator-defined point when "costs" (e.g., time of analysis) start to overrun "benefits" (e.g., number of identified proteins), without prior knowledge of sample contents and complexity. To facilitate analysis of closely related samples, IMMA's inclusion list functionality is currently under development.
Assuntos
Cromatografia Líquida/métodos , Proteínas/análise , Proteômica/métodos , Espectrometria de Massas por Ionização por Electrospray/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Espectrometria de Massas em Tandem/métodos , Algoritmos , Íons , Proteínas/química , Software , Fluxo de TrabalhoRESUMO
Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. We used whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched in bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over 40 well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.
Assuntos
Blastoderma/metabolismo , Drosophila melanogaster/embriologia , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , DNA/metabolismo , Evolução Molecular , MicroRNAs/metabolismoRESUMO
BACKGROUND: The correlation between the expression levels of transcription factors and their target genes can be used to infer interactions within animal regulatory networks, but current methods are limited in their ability to make correct predictions. RESULTS: Here we describe a novel approach which uses nonparametric statistics to generate ordinary differential equation (ODE) models from expression data. Compared to other dynamical methods, our approach requires minimal information about the mathematical structure of the ODE; it does not use qualitative descriptions of interactions within the network; and it employs new statistics to protect against over-fitting. It generates spatio-temporal maps of factor activity, highlighting the times and spatial locations at which different regulators might affect target gene expression levels. We identify an ODE model for eve mRNA pattern formation in the Drosophila melanogaster blastoderm and show that this reproduces the experimental patterns well. Compared to a non-dynamic, spatial-correlation model, our ODE gives 59% better agreement to the experimentally measured pattern. Our model suggests that protein factors frequently have the potential to behave as both an activator and inhibitor for the same cis-regulatory module depending on the factors' concentration, and implies different modes of activation and repression. CONCLUSIONS: Our method provides an objective quantification of the regulatory potential of transcription factors in a network, is suitable for both low- and moderate-dimensional gene expression datasets, and includes improvements over existing dynamic and static models.
Assuntos
Drosophila melanogaster/embriologia , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Modelos Biológicos , Animais , Blastoderma , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Proteínas de Homeodomínio/genética , Proteínas/genética , Fatores de Transcrição/genética , Transcrição GênicaRESUMO
To facilitate a direct interface between protein separation by PAGE and protein identification by mass spectrometry, we developed a multichannel system that continuously collects fractions as protein bands migrate off the bottom of gel electrophoresis columns. The device was constructed using several short linear gel columns, each of a different percent acrylamide, to achieve a separation power similar to that of a long gradient gel. A "Counter Free-Flow" elution technique then allows continuous and simultaneous fraction collection from multiple channels at low cost. We demonstrate that rapid, high-resolution separation of a complex protein mixture can be achieved on this system using SDS-PAGE. In a 2.5 h electrophoresis run, for example, each sample was separated and eluted into 48-96 fractions over a mass range of approximately 10-150 kDa; sample recovery rates were 50% or higher; each channel was loaded with up to 0.3 mg of protein in 0.4 mL; and a purified band was eluted in two to three fractions (200 microL/fraction). Similar results were obtained when running native gel electrophoresis, but protein aggregation limited the loading capacity to about 50 microg per channel and reduced resolution.
Assuntos
Eletroforese em Gel de Poliacrilamida/instrumentação , Ensaios de Triagem em Larga Escala , Proteínas/isolamento & purificação , Proteínas de Bactérias/química , Proteínas de Bactérias/isolamento & purificação , Desulfovibrio vulgaris/química , Eletroforese em Gel de Poliacrilamida/métodos , Espectrometria de Massas , Peso Molecular , Proteínas/química , Fatores de TempoRESUMO
We developed a multichannel gel electrophoresis system that continuously collects fractions as protein bands migrate to the bottom of gel columns. The device uses several short linear gel columns, each of a different percent acrylamide, to achieve a separation power similar to that of a long gradient gel. A "counter-free-flow" elution technique allows continuous and simultaneous fraction collection from multiple channels at low cost. Using the system with SDS-PAGE, 300 µg samples of protein can be separated and eluted into 48-96 fractions over a mass range of 10-150 kDa in 2.5 h. Each eluted protein can be recovered at 50% efficiency or higher in ~500 µL. The system can also be used for native gel electrophoresis, but protein aggregation limits the loading capacity to about 50 µg per channel and reduces resolution. This system has the potential to be coupled with mass spectrometry to achieve high-throughput protein identification.
Assuntos
Eletroforese em Gel de Poliacrilamida/instrumentação , Proteínas/isolamento & purificação , Animais , Eletroforese em Gel de Poliacrilamida/economia , Desenho de Equipamento , Humanos , Espectrometria de Massas , Peso Molecular , Proteínas/análise , Tamanho da AmostraRESUMO
Laser-scanning microscopy allows rapid acquisition of multi-channel data, paving the way for high-throughput, high-content analysis of large numbers of images. An inherent problem of using multiple fluorescent dyes is overlapping emission spectra, which results in channel cross-talk and reduces the ability to extract quantitative measurements. Traditional unmixing methods rely on measuring channel cross-talk and using fixed acquisition parameters, but these requirements are not suited to high-throughput processing. Here we present a simple automatic method to correct for channel cross-talk in multi-channel images using image data only. The method is independent of the acquisition parameters but requires some spatial separation between different dyes in the image. We evaluate the method by comparing the cross-talk levels it estimates to those measured directly from a standard fluorescent slide. The method is then applied to a high-throughput analysis pipeline that measures nuclear volumes and relative expression of gene products from three-dimensional, multi-channel fluorescence images of whole Drosophila embryos. Analysis of images before unmixing revealed an aberrant spatial correlation between measured nuclear volumes and the gene expression pattern in the shorter wavelength channel. Applying the unmixing algorithm before performing these analyses removed this correlation.
RESUMO
The gain and loss of functional transcription factor binding sites has been proposed as a major source of evolutionary change in cis-regulatory DNA and gene expression. We have developed an evolutionary model to study binding-site turnover that uses multiple sequence alignments to assess the evolutionary constraint on individual binding sites, and to map gain and loss events along a phylogenetic tree. We apply this model to study the evolutionary dynamics of binding sites of the Drosophila melanogaster transcription factor Zeste, using genome-wide in vivo (ChIP-chip) binding data to identify functional Zeste binding sites, and the genome sequences of D. melanogaster, D. simulans, D. erecta, and D. yakuba to study their evolution. We estimate that more than 5% of functional Zeste binding sites in D. melanogaster were gained along the D. melanogaster lineage or lost along one of the other lineages. We find that Zeste-bound regions have a reduced rate of binding-site loss and an increased rate of binding-site gain relative to flanking sequences. Finally, we show that binding-site gains and losses are asymmetrically distributed with respect to D. melanogaster, consistent with lineage-specific acquisition and loss of Zeste-responsive regulatory elements.
Assuntos
Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Elementos de Resposta/genética , Fatores de Transcrição/metabolismo , Animais , Sequência de Bases , Sítios de Ligação , Biologia Computacional , Sequência Conservada , DNA Intergênico/genética , Proteínas de Ligação a DNA/genética , Proteínas de Drosophila/genética , Evolução Molecular , Modelos Genéticos , Dados de Sequência Molecular , Regiões Promotoras Genéticas/genética , Seleção Genética , Alinhamento de SequênciaRESUMO
Large scale surveys in mammalian tissue culture cells suggest that the protein expressed at the median abundance is present at 8,000-16,000 molecules per cell and that differences in mRNA expression between genes explain only 10-40% of the differences in protein levels. We find, however, that these surveys have significantly underestimated protein abundances and the relative importance of transcription. Using individual measurements for 61 housekeeping proteins to rescale whole proteome data from Schwanhausser et al. (2011), we find that the median protein detected is expressed at 170,000 molecules per cell and that our corrected protein abundance estimates show a higher correlation with mRNA abundances than do the uncorrected protein data. In addition, we estimated the impact of further errors in mRNA and protein abundances using direct experimental measurements of these errors. The resulting analysis suggests that mRNA levels explain at least 56% of the differences in protein abundance for the 4,212 genes detected by Schwanhausser et al. (2011), though because one major source of error could not be estimated the true percent contribution should be higher. We also employed a second, independent strategy to determine the contribution of mRNA levels to protein expression. We show that the variance in translation rates directly measured by ribosome profiling is only 12% of that inferred by Schwanhausser et al. (2011), and that the measured and inferred translation rates correlate poorly (R(2) = 0.13). Based on this, our second strategy suggests that mRNA levels explain â¼81% of the variance in protein levels. We also determined the percent contributions of transcription, RNA degradation, translation and protein degradation to the variance in protein abundances using both of our strategies. While the magnitudes of the two estimates vary, they both suggest that transcription plays a more important role than the earlier studies implied and translation a much smaller role. Finally, the above estimates only apply to those genes whose mRNA and protein expression was detected. Based on a detailed analysis by Hebenstreit et al. (2012), we estimate that approximately 40% of genes in a given cell within a population express no mRNA. Since there can be no translation in the absence of mRNA, we argue that differences in translation rates can play no role in determining the expression levels for the â¼40% of genes that are non-expressed.
RESUMO
Animals comprise dynamic three-dimensional arrays of cells that express gene products in intricate spatial and temporal patterns that determine cellular differentiation and morphogenesis. A rigorous understanding of these developmental processes requires automated methods that quantitatively record and analyze complex morphologies and their associated patterns of gene expression at cellular resolution. Here we summarize light microscopy-based approaches to establish permanent, quantitative datasets-atlases-that record this information. We focus on experiments that capture data for whole embryos or large areas of tissue in three dimensions, often at multiple time points. We compare and contrast the advantages and limitations of different methods and highlight some of the discoveries made. We emphasize the need for interdisciplinary collaborations and integrated experimental pipelines that link sample preparation, image acquisition, image analysis, database design, visualization, and quantitative analysis.