RESUMO
Hydroxyl-radical footprinting (HRF) is a powerful method for probing structures of nucleic acid-protein complexes with single-nucleotide resolution in solution. To tap the full quantitative potential of HRF, we describe a protocol, hydroxyl-radical footprinting interpretation for DNA (HYDROID), to quantify HRF data and integrate them with atomistic structural models. The stages of the HYDROID protocol are extraction of the lane profiles from gel images, quantification of the DNA cleavage frequency at each nucleotide and theoretical estimation of the DNA cleavage frequency from atomistic structural models, followed by comparison of experimental and theoretical results. Example scripts for each step of HRF data analysis and interpretation are provided for several nucleosome systems; they can be easily adapted to analyze user data. As input, HYDROID requires polyacrylamide gel electrophoresis (PAGE) images of HRF products and optionally can use a molecular model of the DNA-protein complex. The HYDROID protocol can be used to quantify HRF over DNA regions of up to 100 nucleotides per gel image. In addition, it can be applied to the analysis of RNA-protein complexes and free RNA or DNA molecules in solution. Compared with other methods reported to date, HYDROID is unique in its ability to simultaneously integrate HRF data with the analysis of atomistic structural models. HYDROID is freely available. The complete protocol takes ~3 h. Users should be familiar with the command-line interface, the Python scripting language and Protein Data Bank (PDB) file formats. A graphical user interface (GUI) with basic functionality (HYDROID_GUI) is also available.
Assuntos
Pegada de DNA/métodos , DNA/química , Radical Hidroxila/química , Pegadas de Proteínas/métodos , Proteínas/química , Software , DNA/metabolismo , Clivagem do DNA , Pegada de DNA/estatística & dados numéricos , Eletroforese em Gel de Poliacrilamida/estatística & dados numéricos , Humanos , Modelos Moleculares , Nucleossomos/química , Nucleossomos/metabolismo , Pegadas de Proteínas/estatística & dados numéricos , Proteínas/metabolismo , SoluçõesRESUMO
A transcriptional regulatory network encompasses sets of genes (regulons) whose expression states are directly altered in response to an activating signal, mediated by trans-acting regulatory proteins and cis-acting regulatory sequences. Enumeration of these network components is an essential step toward the creation of a framework for systems-based analysis of biological processes. Profile-based methods for the detection of cis-regulatory elements are often applied to predict regulon members, but they suffer from poor specificity. In this report we describe Regulogger, a novel computational method that uses comparative genomics to eliminate spurious members of predicted gene regulons. Regulogger produces regulogs, sets of coregulated genes for which the regulatory sequence has been conserved across multiple organisms. The quantitative method assigns a confidence score to each predicted regulog member on the basis of the degree of conservation of protein sequence and regulatory mechanisms. When applied to a reference collection of regulons from Escherichia coli, Regulogger increased the specificity of predictions up to 25-fold over methods that use cis-element detection in isolation. The enhanced specificity was observed across a wide range of biologically meaningful parameter combinations, indicating a robust and broad utility for the method. The power of computational pattern discovery methods coupled with Regulogger to unravel transcriptional networks was demonstrated in an analysis of the genome of Staphylococcus aureus. A total of 125 regulogs were found in this organism, including both well-defined functional groups and a subset with unknown functions.
Assuntos
Sequência Conservada/genética , Regulon/genética , Software , Staphylococcus aureus/genética , Bacillus subtilis/genética , Biologia Computacional , Pegada de DNA/métodos , Pegada de DNA/estatística & dados numéricos , DNA Bacteriano/genética , Genoma Bacteriano , Filogenia , Validação de Programas de ComputadorRESUMO
Prediction of transcription-factor target sites in promoters remains difficult due to the short length and degeneracy of the target sequences. Although the use of orthologous sequences and phylogenetic footprinting approaches may help in the recognition of conserved and potentially functional sequences, correct alignment of the short transcription-factor binding sites can be problematic for established algorithms, especially when aligning more divergent species. Here, we report a novel phylogenetic footprinting approach, CONREAL, that uses biologically relevant information, that is, potential transcription-factor binding sites as represented by positional weight matrices, to establish anchors between orthologous sequences and to guide promoter sequence alignment. Comparison of the performance of CONREAL with the global alignment programs LAGAN and AVID using a reference data set, shows that CONREAL performs equally well for closely related species like rodents and human, and has a clear added value for aligning promoter elements of more divergent species like human and fish, as it identifies conserved transcription-factor binding sites that are not found by other methods. CONREAL is accessible via a Web interface at http://conreal.niob.knaw.nl/.
Assuntos
Algoritmos , Sequência Conservada/genética , Pegada de DNA/métodos , Filogenia , Sequências Reguladoras de Ácido Nucleico/genética , Alinhamento de Sequência , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação/genética , Pegada de DNA/estatística & dados numéricos , Proteínas de Ligação a DNA/genética , Fator 3-beta Nuclear de Hepatócito , Humanos , Internet , Camundongos , Proteínas Nucleares/genética , Regiões Promotoras Genéticas/genética , Ratos , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Homologia de Sequência do Ácido Nucleico , Takifugu/genética , Peixe-Zebra/genéticaRESUMO
BACKGROUND: Transcription regulatory regions in higher eukaryotes are often represented by cis-regulatory modules (CRM) and are responsible for the formation of specific spatial and temporal gene expression patterns. These extended, approximately 1 KB, regions are found far from coding sequences and cannot be extracted from genome on the basis of their relative position to the coding regions. RESULTS: To explore the feasibility of CRM extraction from a genome, we generated an original training set, containing annotated sequence data for most of the known developmental CRMs from Drosophila. Based on this set of experimental data, we developed a strategy for statistical extraction of cis-regulatory modules from the genome, using exhaustive analysis of local word frequency (LWF). To assess the performance of our analysis, we measured the correlation between predictions generated by the LWF algorithm and the distribution of conserved non-coding regions in a number of Drosophila developmental genes. CONCLUSIONS: In most of the cases tested, we observed high correlation (up to 0.6-0.8, measured on the entire gene locus) between the two independent techniques. We discuss computational strategies available for extraction of Drosophila CRMs and possible extensions of these methods.
Assuntos
Drosophila melanogaster/genética , Perfilação da Expressão Gênica/estatística & dados numéricos , Genoma , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , DNA/genética , Pegada de DNA/métodos , Pegada de DNA/estatística & dados numéricos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica no Desenvolvimento/genética , Internet , FilogeniaRESUMO
UNLABELLED: Phylogenetic footprinting is an efficient approach for revealing potential transcription factor binding sites in promoter sequences. The idea is based on an assumption that functional sites in promoters should evolve much slower then other regions that do not bear any conservative function. Therefore, potential transcription factor (TF) binding sites that are found in the evolutionally conservative regions of promoters have more chances to be considered as "real" sites. The most difficult step of the phylogenetic footprinting is alignment of promoter sequences between different organisms (fe. human and mouse). The conventional alignment methods often can not align promoters due to the high level of sequence variability. We have developed a new alignment method that takes into account similarity in distribution of potential binding sites (motif-based alignment). This method has been used effectively for promoter alignment and for revealing new potential binding sites for various transcription factors. We made a systematic phylogenetic footprinting of human/mouse conserved non-coding sequences (CNS). 60 thousand potential binding sites were revealed in human and mouse genomes. We have developed a database of the predicted potential TF binding sites. AVAILABILITY: http://compel.bionet.nsc.ru/FunSite/footprint/; www.gene-regulation.com/.
Assuntos
Pegada de DNA/estatística & dados numéricos , Genômica/estatística & dados numéricos , Filogenia , Algoritmos , Animais , Sequência de Bases , Sítios de Ligação/genética , Sequência Conservada , DNA/genética , DNA/metabolismo , Genoma Humano , Humanos , Camundongos , Dados de Sequência Molecular , Regiões Promotoras Genéticas , Alinhamento de Sequência/estatística & dados numéricos , Transcrição GênicaRESUMO
As the number of sequenced genomes has grown, the questions of which species are most useful and how many genomes are sufficient for comparison have become increasingly important for comparative genomics studies. We have systematically addressed these questions with respect to phylogenetic footprinting of transcription factor (TF) binding sites in the gamma-proteobacteria, and have evaluated the statistical significance of our motif predictions. We used a study set of 166 Escherichia coli genes that have experimentally identified TF binding sites upstream of the gene, with orthologous data from nine additional gamma-proteobacteria for phylogenetic footprinting. Just three species were sufficient for approximately 74.0% of the motif predictions to correspond to the experimentally reported E. coli sites, and important characteristics to consider when choosing species were phylogenetic distance, genome size, and natural habitat. We also performed simulations using randomized data to determine the critical maximum a posteriori probability (MAP) values for statistical significance of our motif predictions (P = 0.05). Approximately 60% of motif predictions containing sites from just three species had average MAP values above these critical MAP values. The inclusion of a species very closely related to E. coli increased the number of statistically significant motif predictions, despite substantially increasing the critical MAP value.
Assuntos
Genes Bacterianos/genética , Genoma Bacteriano , Fatores de Transcrição/metabolismo , Sítios de Ligação/genética , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Pegada de DNA/métodos , Pegada de DNA/estatística & dados numéricos , Gammaproteobacteria/genética , Bactérias Gram-Negativas/genética , Funções Verossimilhança , Filogenia , RNA Bacteriano/genética , RNA Bacteriano/metabolismo , RNA Ribossômico 16S/genética , Homologia de Sequência do Ácido Nucleico , Especificidade da EspécieRESUMO
Phylogenetic footprinting is a technique that identifies regulatory elements by finding unusually well conserved regions in a set of orthologous noncoding DNA sequences from multiple species. We introduce a new motif-finding problem, the Substring Parsimony Problem, which is a formalization of the ideas behind phylogenetic footprinting, and we present an exact dynamic programming algorithm to solve it. We then present a number of algorithmic optimizations that allow our program to run quickly on most biologically interesting datasets. We show how to handle data sets in which only an unknown subset of the sequences contains the regulatory element. Finally, we describe how to empirically assess the statistical significance of the motifs found. Each technique is implemented and successfully identifies a number of known binding sites, as well as several highly conserved but uncharacterized regions. The program is available at http://bio.cs.washington.edu/software.html.
Assuntos
Algoritmos , Pegada de DNA/estatística & dados numéricos , Filogenia , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Genes Reguladores , SoftwareAssuntos
Pegada de DNA/métodos , DNA/química , Preparações Farmacêuticas/química , Pegada de DNA/estatística & dados numéricos , Desoxirribonuclease I , Dietil Pirocarbonato , Ácido Edético/análogos & derivados , Endodesoxirribonucleases , Radical Hidroxila , Substâncias Macromoleculares , Nuclease do Micrococo , Sondas Moleculares , Tetróxido de Ósmio , Permanganato de PotássioAssuntos
Pegada de DNA/métodos , DNA/química , Imidazóis/química , Pirróis/química , Sequência de Bases , Sítios de Ligação , Pegada de DNA/estatística & dados numéricos , Primers do DNA/genética , Desoxirribonuclease I , Desenho de Fármacos , Ácido Edético/análogos & derivados , Ligantes , Substâncias Macromoleculares , Dados de Sequência MolecularRESUMO
En este trabajo presentamos nuestra experiencia en la utilización de la prueba de compatibilidad inmunogenética para establecer vínculos biológicos en tríos típicos. En la década del 70 nuestro Servicio utilizaba únicamente los sistemas eritrocitarios que nos permitían excluir solamente el 20 por ciento de los hombres falsamente alegados como padres. Posteriormente incorporamos al estudio de los antígenos del sistema HLA, que posibilitó la inclusión del padre alegado como biológico, con probabilidades de paternidad comprendidas entre el 90 al 98 por ciento, según la frecuencia poblacional del halotipo HLA obligado. El desarrollo de las técnicas de biología molecular ha permitido incorporar a las pericias realizadas en nuestro laboratorio el análisis del polimorfismo del ADN. Proponemos que la prueba de compatibilidad inmunogenética comprenda el estudio de marcadores fenotípicos (antígenos de los sistemas eritrocitarios y del sistema HLA) y genotípicos (polimorfismo del ADN). La utilización de esta metodología permite la inclusión de la paternidad con una confiabilidad de los resultados superior al 99,99 por ciento.
Assuntos
Humanos , Tipagem e Reações Cruzadas Sanguíneas , Pegada de DNA/estatística & dados numéricos , Marcadores Genéticos , Imunogenética/métodos , Polimorfismo GenéticoRESUMO
En los últimos años la tecnología del ADN se ha convertido en una poderosa herramienta para la identificación individual, incluso cuando sólamente se cuenta con restos óseos. Se ilustra un caso en el cual se realizó la prueba de identificación comparando el ADN de los supuestos padres contra el ADN extraído del hueso de un menor de sexo femenino, utilizando los marcadores : HLA-DQA1, LDLR, GYPA, HBGG, D7S8, Gc, y D1S80. Los perfiles genéticos obtenidos permitieron concluir una identificación positiva de los restos como pertenecientes a la niña desaparecida. Esta metodología queda disponible para futuros casos forenses que así lo ameriten