Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Nucleic Acids Res ; 29(23): 4881-91, 2001 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-11726698

RESUMEN

The sequence logo for DNA binding sites of the bacteriophage P1 replication protein RepA shows unusually high sequence conservation ( approximately 2 bits) at a minor groove that faces RepA. However, B-form DNA can support only 1 bit of sequence conservation via contacts into the minor groove. The high conservation in RepA sites therefore implies a distorted DNA helix with direct or indirect contacts to the protein. Here I show that a high minor groove conservation signature also appears in sequence logos of sites for other replication origin binding proteins (Rts1, DnaA, P4 alpha, EBNA1, ORC) and promoter binding proteins (sigma(70), sigma(D) factors). This finding implies that DNA binding proteins generally use non-B-form DNA distortion such as base flipping to initiate replication and transcription.


Asunto(s)
Replicación del ADN , Proteínas de Unión al ADN/metabolismo , ADN/química , Origen de Réplica , Sitio de Iniciación de la Transcripción , Transcripción Genética , Proteínas Virales , Proteínas Bacterianas/metabolismo , Secuencia de Bases , Sitios de Unión , Secuencia Conservada , ADN/metabolismo , ADN Helicasas/metabolismo , ARN Polimerasas Dirigidas por ADN/metabolismo , Antígenos Nucleares del Virus de Epstein-Barr/metabolismo , Conformación de Ácido Nucleico , Complejo de Reconocimiento del Origen , Regiones Promotoras Genéticas , Unión Proteica , ARN Nucleotidiltransferasas/metabolismo , Factor sigma/metabolismo
2.
Nucleic Acids Res ; 28(14): 2794-9, 2000 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-10908337

RESUMEN

How do genetic systems gain information by evolutionary processes? Answering this question precisely requires a robust, quantitative measure of information. Fortunately, 50 years ago Claude Shannon defined information as a decrease in the uncertainty of a receiver. For molecular systems, uncertainty is closely related to entropy and hence has clear connections to the Second Law of Thermodynamics. These aspects of information theory have allowed the development of a straightforward and practical method of measuring information in genetic control systems. Here this method is used to observe information gain in the binding sites for an artificial 'protein' in a computer simulation of evolution. The simulation begins with zero information and, as in naturally occurring genetic systems, the information measured in the fully evolved binding sites is close to that needed to locate the sites in the genome. The transition is rapid, demonstrating that information gain can occur by punctuated equilibrium.


Asunto(s)
Sitios de Unión/genética , Evolución Molecular , Teoría de la Información , Secuencia de Bases , Modelos Biológicos , Datos de Secuencia Molecular , Selección Genética , Programas Informáticos , Termodinámica
3.
Nucleic Acids Res ; 29(23): 4892-900, 2001 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-11726699

RESUMEN

The RepA protein from bacteriophage P1 binds DNA to initiate replication. RepA covers one face of the DNA and the binding site has a completely conserved T that directly faces RepA from the minor groove at position +7. Although all four bases can be distinguished through contacts in the major groove of B-form DNA, contacts in the minor groove cannot easily distinguish between A and T bases. Therefore the 100% conservation at this position cannot be accounted for by direct contacts approaching into the minor groove of B-form DNA. RepA binding sites with modified base pairs at position +7 were used to investigate contacts with RepA. The data show that RepA contacts the N3 proton of T at position +7 and that the T=A hydrogen bonds are already broken in the DNA before RepA binds. To accommodate the N3 proton contact the T(+7 )/A(+7)((')) base pair must be distorted. One possibility is that T(+7) is flipped out of the helix. The energetics of the contact allows RepA to distinguish between all four bases, accounting for the observed high sequence conservation. After protein binding, base pair distortion or base flipping could initiate DNA melting as the second step in DNA replication.


Asunto(s)
ADN Helicasas , Replicación del ADN , ADN/química , ADN/metabolismo , Proteínas/metabolismo , Origen de Réplica , Timina/metabolismo , Transactivadores , Emparejamiento Base , Secuencia de Bases , Sitios de Unión , Secuencia Conservada , Proteínas de Unión al ADN/metabolismo , Ensayo de Cambio de Movilidad Electroforética , Enlace de Hidrógeno , Modelos Genéticos , Conformación de Ácido Nucleico , Unión Proteica , Protones
4.
Nucleic Acids Res ; 29(7): 1443-52, 2001 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-11266544

RESUMEN

Defects in the XPG DNA repair endonuclease gene can result in the cancer-prone disorders xeroderma pigmentosum (XP) or the XP-Cockayne syndrome complex. While the XPG cDNA sequence was known, determination of the genomic sequence was required to understand its different functions. In cells from normal donors, we found that the genomic sequence of the human XPG gene spans 30 kb, contains 15 exons that range from 61 to 1074 bp and 14 introns that range from 250 to 5763 bp. Analysis of the splice donor and acceptor sites using an information theory-based approach revealed three splice sites with low information content, which are components of the minor (U12) spliceosome. We identified six alternatively spliced XPG mRNA isoforms in cells from normal donors and from XPG patients: partial deletion of exon 8, partial retention of intron 8, two with alternative exons (in introns 1 and 6) and two that retained complete introns (introns 3 and 9). The amount of alternatively spliced XPG mRNA isoforms varied in different tissues. Most alternative splice donor and acceptor sites had a relatively high information content, but one has the U12 spliceosome sequence. A single nucleotide polymorphism has allele frequencies of 0.74 for 3507G and 0.26 for 3507C in 91 donors. The human XPG gene contains multiple splice sites with low information content in association with multiple alternatively spliced isoforms of XPG mRNA.


Asunto(s)
Proteínas de Unión al ADN/genética , Empalme Alternativo , Secuencia de Bases , Línea Celular , ADN/química , ADN/genética , Endonucleasas , Exones , Genes/genética , Humanos , Intrones , Masculino , Datos de Secuencia Molecular , Proteínas Nucleares , Polimorfismo de Nucleótido Simple , ARN Mensajero/genética , ARN Mensajero/metabolismo , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Análisis de Secuencia de ADN , Distribución Tisular , Factores de Transcripción
5.
J Mol Biol ; 228(4): 1124-36, 1992 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-1474582

RESUMEN

An information analysis of the 5' (donor) and 3' (acceptor) sequences spanning the ends of nearly 1800 human introns has provided evidence for structural features of splice sites that bear upon spliceosome evolution and function: (1) 82% of the sequence information (i.e. sequence conservation) at donor junctions and 97% of the sequence information at acceptor junctions is confined to the introns, allowing codon choices throughout exons to be largely unrestricted. The distribution of information at intron-exon junctions is also described in detail and compared with footprints. (2) Acceptor sites are found to possess enough information to be located in the transcribed portion of the human genome, whereas donor sites possess about one bit less than the information needed to locate them independently. This difference suggests that acceptor sites are located first in humans and, having been located, reduce by a factor of two the number of alternative sites available as donors. Direct experimental evidence exists to support this conclusion. (3) The sequences of donor and acceptor splice sites exhibit a striking similarity. This suggests that the two junctions derive from a common ancestor and that during evolution the information of both sites shifted onto the intron. If so, the protein and RNA components that are found in contemporary spliceosomes, and which are responsible for recognizing donor and acceptor sequences, should also be related. This conclusion is supported by the common structures found in different parts of the spliceosome.


Asunto(s)
Evolución Biológica , Intrones/genética , Empalme del ARN/genética , Empalmosomas , Secuencia de Bases , Secuencia Conservada , Metabolismo Energético , Exones/genética , Genoma Humano , Humanos , Teoría de la Información , Datos de Secuencia Molecular , Método de Montecarlo
6.
J Mol Biol ; 233(2): 219-30, 1993 Sep 20.
Artículo en Inglés | MEDLINE | ID: mdl-8377199

RESUMEN

The replication initiator protein RepA of plasmid P1 can bind to 14 sites on the plasmid. These sites are variously used to autoregulate RepA synthesis and for initiation and control of DNA replication. Analysis of information (degree of conservation) at the sites revealed three sequence patches of high conservation. By saturation mutagenesis, the conservation at the outer two patches was found to contribute to RepA binding more critically. The guanine bases that are likely to contact RepA through the major groove were identified by methylation interference and methylation protection experiments. These bases mapped to the outer two patches and were separated by one turn of the helix. Therefore, they belong to major grooves on the same face of DNA. All backbone contacts of the protein, determined by hydroxyl radical footprinting, also mapped to the same face. We conclude from this that RepA binds to its site on one face of the DNA. Information analysis of binding sites for several prokaryotic repressors and activators, where the nature of DNA-protein contacts are known, revealed a correlation between the positions of high conservation and the positions of major grooves that faced the protein. The middle patch of high conservation in the RepA binding sites is an exception since in this region a minor groove is likely to face the protein. The simplest model for minor groove contacts suggests that in B-form DNA a T.A base-pair cannot easily be distinguished from an A.T pair by inspection of the minor groove. Yet in the RepA site, a T-->A mutation in the middle patch significantly affects binding. Therefore, the simplest models for both minor and major groove contacts are unlikely. It is possible that the patch determines the proper conformation of the site and thereby contributes to recognition indirectly.


Asunto(s)
Proteínas Bacterianas/metabolismo , ADN Helicasas , Replicación del ADN , ADN Bacteriano/metabolismo , Proteínas de Unión al ADN/metabolismo , Escherichia coli/metabolismo , Proteínas , Transactivadores , Proteínas Bacterianas/biosíntesis , Secuencia de Bases , Sitios de Unión , Clonación Molecular , Secuencia Conservada , ADN Bacteriano/química , Cinética , Matemática , Modelos Moleculares , Modelos Teóricos , Datos de Secuencia Molecular , Mutagénesis Sitio-Dirigida , Conformación de Ácido Nucleico , Oligodesoxirribonucleótidos/metabolismo , Plásmidos , Proteínas Recombinantes/biosíntesis , Proteínas Recombinantes/metabolismo , Proteínas Represoras/metabolismo , Moldes Genéticos
7.
J Mol Biol ; 188(3): 415-31, 1986 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-3525846

RESUMEN

Repressors, polymerases, ribosomes and other macromolecules bind to specific nucleic acid sequences. They can find a binding site only if the sequence has a recognizable pattern. We define a measure of the information (R sequence) in the sequence patterns at binding sites. It allows one to investigate how information is distributed across the sites and to compare one site to another. One can also calculate the amount of information (R frequency) that would be required to locate the sites, given that they occur with some frequency in the genome. Several Escherichia coli binding sites were analyzed using these two independent empirical measurements. The two amounts of information are similar for most of the sites we analyzed. In contrast, bacteriophage T7 RNA polymerase binding sites contain about twice as much information as is necessary for recognition by the T7 polymerase, suggesting that a second protein may bind at T7 promoters. The extra information can be accounted for by a strong symmetry element found at the T7 promoters. This element may be an operator. If this model is correct, these promoters and operators do not share much information. The comparisons between R sequence and R frequency suggest that the information at binding sites is just sufficient for the sites to be distinguished from the rest of the genome.


Asunto(s)
Sitios de Unión , ADN Bacteriano/genética , Proteínas de Unión al ADN , Serina Endopeptidasas , Proteínas Bacterianas/genética , Secuencia de Bases , ADN Bacteriano/metabolismo , ARN Polimerasas Dirigidas por ADN/genética , Escherichia coli/genética , Escherichia coli/metabolismo , Operón Lac , Regiones Operadoras Genéticas , Operón , Proteínas Represoras , Ribosomas/metabolismo , Estadística como Asunto , Fagos T/genética , Triptófano/genética , Proteínas Virales , Proteínas Reguladoras y Accesorias Virales
8.
J Mol Biol ; 313(1): 215-28, 2001 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-11601857

RESUMEN

During translational initiation in prokaryotes, the 3' end of the 16S rRNA binds to a region just upstream of the initiation codon. The relationship between this Shine-Dalgarno (SD) region and the binding of ribosomes to translation start-points has been well studied, but a unified mathematical connection between the SD, the initiation codon and the spacing between them has been lacking. Using information theory, we constructed a model that treats these three components uniformly by assigning to the SD and the initiation region (IR) conservations in bits of information, and by assigning to the spacing an uncertainty, also in bits. To build the model, we first aligned the SD region by maximizing the information content there. The ease of this process confirmed the existence of the SD pattern within a set of 4122 reviewed and revised Escherichia coli gene starts. This large data set allowed us to show graphically, by sequence logos, that the spacing between the SD and the initiation region affects both the SD site conservation and its pattern. We used the aligned SD, the spacing, and the initiation region to model ribosome binding and to identify gene starts that do not conform to the ribosome binding site model. A total of 569 experimentally proven starts are more conserved (have higher information content) than the full set of revised starts, which probably reflects an experimental bias against the detection of gene products that have inefficient ribosome binding sites. Models were refined cyclically by removing non-conforming weak sites. After this procedure, models derived from either the original or the revised gene start annotation were similar. Therefore, this information theory-based technique provides a method for easily constructing biologically sensible ribosome binding site models. Such models should be useful for refining gene-start predictions of any sequenced bacterial genome.


Asunto(s)
Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Escherichia coli/genética , Genes Bacterianos/genética , Iniciación de la Cadena Peptídica Traduccional/genética , Ribosomas/química , Ribosomas/metabolismo , Secuencia de Bases , Sitios de Unión , Codón Iniciador/genética , Bases de Datos como Asunto , Proteínas de Escherichia coli/química , Teoría de la Información , Modelos Biológicos , Conformación de Ácido Nucleico , Docilidad , Unión Proteica , Estabilidad del ARN , ARN Bacteriano/química , ARN Bacteriano/genética , ARN Bacteriano/metabolismo , ARN Mensajero/química , ARN Mensajero/genética , ARN Mensajero/metabolismo , Proteínas de Unión al ARN/química , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos/genética , Ribosomas/genética
9.
J Invest Dermatol ; 111(5): 791-6, 1998 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-9804340

RESUMEN

A 4 y old boy of Korean ancestry had xeroderma pigmentosum (XP) with sun sensitivity, multiple cutaneous neoplasms, and inability to speak. Neurologic examination revealed hyperactivity and autistic features without typical XP neurologic abnormalities. Cultured skin fibroblasts (XP22BE) showed decreased post-UV survival, reduced post-UV plasmid host cell reactivation and defective DNA repair (16% of normal unscheduled DNA synthesis in intact cells and undetectable excision repair in a cell free extract). In vitro and in vivo complementation assigned XP22BE to XP group C (XPC) and a markedly reduced level of XPC mRNA was found. Two XPC cDNA bands were identified. One band had a deletion of 161 bases comprising the entire exon 9, which resulted in premature termination of the mutant XPC mRNA. The larger band also had the same deletion of exon 9 but, in addition, had an insertion of 155 bases in its place (exon 9a), resulting in an in-frame XPC mRNA. Genomic DNA analysis revealed a T-->G mutation at the splice donor site of XPC exon 9, which markedly reduced its information content. The 155 base pair XPC exon 9a insertion was located in intron 9 and was flanked by strong splice donor and acceptor sequences. Analysis of the patient's blood showed persistently low levels of glycine (68 microM; NL, 125-318 microM). Normal glycine levels were maintained with oral glycine supplements and his hyperactivity diminished. These data provide evidence of an association of an XPC splice site mutation with autistic neurologic features and hypoglycinemia.


Asunto(s)
Trastorno Autístico/complicaciones , Proteínas de Unión al ADN/genética , Glicina/sangre , Xerodermia Pigmentosa/genética , Empalme Alternativo , Northern Blotting , Preescolar , Cromosomas Humanos Par 3 , ADN/genética , Reparación del ADN , Fibroblastos/efectos de la radiación , Marcadores Genéticos/genética , Humanos , Masculino , Repeticiones de Microsatélite/genética , Mutación , Tasa de Supervivencia , Transcripción Genética , Rayos Ultravioleta , Xerodermia Pigmentosa/complicaciones
10.
Gene ; 215(1): 111-22, 1998 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-9666097

RESUMEN

Mutations in the human ABCR gene have been associated with the autosomal recessive Stargardt disease (STGD), retinitis pigmentosa (RP19), and cone-rod dystrophy (CRD) and have also been found in a fraction of age-related macular degeneration (AMD) patients. The ABCR gene is a member of the ATP-binding cassette (ABC) transporter superfamily and encodes a rod photoreceptor-specific membrane protein. The cytogenetic location of the ABCR gene was refined to 1p22.3-1p22.2. The intron/exon structure was determined for the ABCR gene from overlapping genomic clones. ABCR spans over 100kb and comprises 50 exons. Intron/exon splice site sequences are presented for all exons and analyzed for information content (Ri). Nine splice site sequence variants found in STGD and AMD patients are evaluated as potential mutations. The localization of splice sites reveals a high degree of conservation between other members of the ABC1 subfamily, e.g. the mouse Abc1 gene. Analysis of the 870-bp 5' upstream of the transcription start sequence reveals multiple putative photoreceptor-specific regulatory elements including a novel retina-specific transcription factor binding site. These results will be useful in further mutational screening of the ABCR gene in various retinopathies and for determining the substrate and/or function of this photoreceptor-specific ABC transporter.


Asunto(s)
Transportadoras de Casetes de Unión a ATP/genética , Genes/genética , Empalme Alternativo/genética , Secuencia de Bases , Sitios de Unión/genética , Secuencia Conservada/genética , ADN/química , ADN/genética , Evolución Molecular , Exones/genética , Humanos , Intrones/genética , Datos de Secuencia Molecular , Mutación/genética , Regiones Promotoras Genéticas/genética , ARN Mensajero/química , ARN Mensajero/genética , Análisis de Secuencia de ADN
11.
Methods Enzymol ; 274: 445-55, 1996.
Artículo en Inglés | MEDLINE | ID: mdl-8902824

RESUMEN

DNA sequences to which the OxyR protein binds under oxidizing conditions were analyzed by the sequence logo method, a quantitative graphic technique based on information theory. A sequence logo shows both the sequence conservation and the frequencies of bases at each position in a site. Unlike the consensus sequence, the sequence logo analysis revealed that OxyR should bind to four major grooves of DNA. This was later confirmed by experiments. Detailed interpretation of the sequence logo also allowed the prediction of likely major and minor groove OxyR-DNA base contacts, consistent with available experimental results. Because the sequence logo shows the original base frequencies in a clear, easily interpreted graphic that does not distort the data, highly refined analysis of binding site contacts becomes easy. Not only can these methods be applied to any DNA sequence binding site, they can also be applied to sites on RNA and proteins.


Asunto(s)
Secuencia de Bases , Proteínas de Unión al ADN , ADN/química , ADN/metabolismo , Conformación de Ácido Nucleico , Proteínas Represoras/metabolismo , Factores de Transcripción/metabolismo , Proteínas Bacterianas/metabolismo , Composición de Base , Sitios de Unión , Bases de Datos Factuales , Escherichia coli/metabolismo , Proteínas de Escherichia coli , Datos de Secuencia Molecular
12.
Biotechniques ; 11(6): 733-4, 736, 738, 1991 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-1809325

RESUMEN

An automated kinetic assay for beta-galactosidase activity in Escherichia coli was developed to permit the measurement of many independent samples simultaneously. Bacteria are grown, lysed from without (by adsorption of a high multiplicity of bacteriophage T4) and assayed in microtiter plates with 96 wells. Absorbance data are collected and analyzed by computer. The growth and lysis procedure, apparatus and software used in this assay can be used for other spectrophotometric enzyme assays.


Asunto(s)
Escherichia coli/enzimología , beta-Galactosidasa/análisis , Automatización , Cinética , Espectrofotometría Ultravioleta
15.
J Theor Biol ; 148(1): 125-37, 1991 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-2016881

RESUMEN

Single molecules perform a variety of tasks in cells, from replicating, controlling and translating the genetic material to sensing the outside environment. These operations all require that specific actions take place. In a sense, each molecule must make tiny decisions. To make a decision, each "molecular machine" must dissipate an energy Py in the presence of thermal noise Ny. The number of binary decisions that can be made by a machine which has dspace independently moving parts is the "machine capacity" Cy = dspace log2 [(Py + Ny)/Ny]. This formula is closely related to Shannon's channel capacity for communications systems, C = W log2 [(P + N)/N]. This paper shows that the minimum amount of energy that a molecular machine must dissipate in order to gain one bit of information is epsilon min = kB T ln (2) joules/bit. This equation is derived in two distinct ways. The first derivation begins with the Second Law of Thermodynamics, which shows that the statement that there is a minimum energy dissipation is a restatement of the Second Law of Thermodynamics. The second derivation begins with the machine capacity formula, which shows that the machine capacity is also related to the Second Law of Thermodynamics. One of Shannon's theorems for communications channels is that as long as the channel capacity is not exceeded, the error rate may be made as small as desired by a sufficiently involved coding. This result also applies to the dissipation formula for molecular machines. So there is a precise upper bound on the number of choices a molecular machine can make for a given amount of energy loss. This result will be important for the design and construction of molecular computers.


Asunto(s)
ADN , Metabolismo Energético , Sustancias Macromoleculares , Animales , Teoría de la Información , Modelos Biológicos , Termodinámica
16.
J Theor Biol ; 148(1): 83-123, 1991 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-2016886

RESUMEN

Like macroscopic machines, molecular-sized machines are limited by their material components, their design, and their use of power. One of these limits is the maximum number of states that a machine can choose from. The logarithm to the base 2 of the number of states is defined to be the number of bits of information that the machine could "gain" during its operation. The maximum possible information gain is a function of the energy that a molecular machine dissipates into the surrounding medium (Py), the thermal noise energy which disturbs the machine (Ny) and the number of independently moving parts involved in the operation (dspace): Cy = dspace log2 [( Py + Ny)/Ny] bits per operation. This "machine capacity" is closely related to Shannon's channel capacity for communications systems. An important theorem that Shannon proved for communication channels also applies to molecular machines. With regard to molecular machines, the theorem states that if the amount of information which a machine gains is less than or equal to Cy, then the error rate (frequency of failure) can be made arbitrarily small by using a sufficiently complex coding of the molecular machine's operation. Thus, the capacity of a molecular machine is sharply limited by the dissipation and the thermal noise, but the machine failure rate can be reduced to whatever low level may be required for the organism to survive.


Asunto(s)
ADN , Teoría de la Información , Sustancias Macromoleculares , Animales , Modelos Biológicos , Termodinámica
17.
J Theor Biol ; 189(4): 427-41, 1997 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-9446751

RESUMEN

Related genetic sequences having a common function can be described by Shannon's information measure and depicted graphically by a sequence logo. Though useful for many purposes, sequence logos only show the average sequence conservation, and inferring the conservation for individual sequences is difficult. This limitation is overcome by the individual information ( R i) technique described here. The method begins by generating a weight matrix from the frequencies of each nucleotide or amino acid at each position of the aligned sequences. This matrix is then applied to the sequences themselves to determine the sequence conservation of each individual sequence. The matrix is unique because the average of these assignments is the total sequence conservation, ad there is only one way to construct such a matrix. For binding sites on polynucleotides, the weight matrix has a natural cut off that distinguishes functional sequences from other sequences. R i values are on an absolute scale measured in bits of information so the conservation of different biological functions can be compared with one another. The matrix can be used to rank-order the sequences, to search for new sequences, to compare sequences to other quantitative data such as binding energy or distance between binding sites, to distinguish mutations from polymorphisms, to design sequences of a given strength, and to detect errors in databases. The R i method has been used to identify previously undescribed but experimentally verified DNA binding sites. The individual information distribution was determined for E. coli ribosome binding sites, bacterial Fis binding sites, and human donor and acceptor splice junctions, among others. The distributions demonstrate clearly that the consensus sequence is highly unusual, and hence is a poor method to describe naturally occurring binding sites.


Asunto(s)
Teoría de la Información , Modelos Genéticos , Polinucleótidos/genética , Animales , Sitios de Unión , Secuencia Conservada , Bases de Datos Factuales , Humanos , Termodinámica
18.
Nucleic Acids Res ; 25(21): 4408-15, 1997 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-9336476

RESUMEN

A graphical method is presented for displaying how binding proteins and other macromolecules interact with individual bases of nucleotide sequences. Characters representing the sequence are either oriented normally and placed above a line indicating favorable contact, or upside-down and placed below the line indicating unfavorable contact. The positive or negative height of each letter shows the contribution of that base to the average sequence conservation of the binding site, as represented by a sequence logo. These sequence 'walkers' can be stepped along raw sequence data to visually search for binding sites. Many walkers, for the same or different proteins, can be simultaneously placed next to a sequence to create a quantitative map of a complex genetic region. One can alter the sequence to quantitatively engineer binding sites. Database anomalies can be visualized by placing a walker at the recorded positions of a binding molecule and by comparing this to locations found by scanning the nearby sequences. The sequence can also be altered to predict whether a change is a polymorphism or a mutation for the recognizer being modeled.


Asunto(s)
Secuencia de Bases/genética , Proteínas de Unión al ADN/metabolismo , ADN/genética , Proteínas de Unión al ARN/metabolismo , Programas Informáticos , ADN/metabolismo , Bases de Datos Factuales , Matemática , ARN/metabolismo
19.
Nucleic Acids Res ; 27(3): 882-7, 1999 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-9889287

RESUMEN

In vitro experiments that characterize DNA-protein interactions by artificial selection, such as SELEX,are often performed with the assumption that the experimental conditions are equivalent to natural ones. To test whether SELEX gives natural results, we compared sequence logos composed from naturally occurring leucine-responsive regulatory protein (Lrp) binding sites with those composed from SELEX-generated binding sites. The sequence logos were significantly different, indicating that the binding conditions are disparate. A likely explanation is that the SELEX experiment selected for a dimeric or trimeric Lrp complex bound to DNA. In contrast, natural sites appear to be bound by a monomer. This discrepancy suggests that in vitro selections do not necessarily give binding site sets comparable with the natural binding sites.


Asunto(s)
Proteínas de Unión al ADN/metabolismo , ADN/metabolismo , Sistemas de Información , Biología Molecular/métodos , Selección Genética , Secuencia de Bases , Sitios de Unión , ADN/química , Huella de ADN , Sondas de ADN , Dimerización , Leucina , Proteína Reguladora de Respuesta a la Leucina , Ligandos , Modelos Teóricos , Datos de Secuencia Molecular , Alineación de Secuencia , Factores de Transcripción
20.
Hum Mutat ; 6(1): 74-6, 1995.
Artículo en Inglés | MEDLINE | ID: mdl-7550236

RESUMEN

Predicting the effects of nucleotide substitutions in human splice sites has been based on analysis of consensus sequences. We used a graphic representation of sequence conservation and base frequency, the sequence logo, to demonstrate that a change in a splice acceptor of hMSH2 (a gene associated with familial nonpolyposis colon cancer) probably does not reduce splicing efficiency. This confirms a population genetic study that suggested that this substitution is a genetic polymorphism. The information theory-based sequence logo is quantitative and more sensitive than the corresponding splice acceptor consensus sequence for detection of true mutations. Information analysis may potentially be used to distinguish polymorphisms from mutations in other types of transcriptional, translational, or protein-coding motifs.


Asunto(s)
Secuencia de Bases , Mutación , Polimorfismo Genético , Empalme del ARN/genética , Secuencia de Consenso , Humanos
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda