RESUMO
BACKGROUND: The comparison of complete genomes has revealed surprisingly large numbers of conserved non-protein-coding (CNC) DNA regions. However, the biological function of CNC remains elusive. CNC differ in two aspects from conserved protein-coding regions. They are not conserved across phylum boundaries, and they do not contain readily detectable sub-domains. Here we characterize the persistence length and time of CNC and conserved protein-coding regions in the vertebrate and insect lineages. RESULTS: The persistence length is the length of a genome region over which a certain level of sequence identity is consistently maintained. The persistence time is the evolutionary period during which a conserved region evolves under the same selective constraints. Our main findings are: (i) Insect genomes contain 1.60 times less conserved information than vertebrates; (ii) Vertebrate CNC have a higher persistence length than conserved coding regions or insect CNC; (iii) CNC have shorter persistence times as compared to conserved coding regions in both lineages. CONCLUSION: Higher persistence length of vertebrate CNC indicates that the conserved information in vertebrates and insects is organized in functional elements of different lengths. These findings might be related to the higher morphological complexity of vertebrates and give clues about the structure of active CNC elements. Shorter persistence time might explain the previously puzzling observations of highly conserved CNC within each phylum, and of a lack of conservation between phyla. It suggests that CNC divergence might be a key factor in vertebrate evolution. Further evolutionary studies will help to relate individual CNC to specific developmental processes.
Assuntos
DNA Intergênico/genética , Evolução Molecular , Genoma/genética , Vertebrados/genética , Animais , Sequência Conservada , Drosophila/genética , Genoma de Inseto/genética , Humanos , Fatores de TempoRESUMO
BACKGROUND: Cleavage of messenger RNA (mRNA) precursors is an essential step in mRNA maturation. The signal recognized by the cleavage enzyme complex has been characterized as an A rich region upstream of the cleavage site containing a motif with consensus AAUAAA, followed by a U or UG rich region downstream of the cleavage site. RESULTS: We studied these signals using exhaustive databases of cleavage sites obtained from aligning raw expressed sequence tags (EST) sequences to genomic sequences in Homo sapiens and Drosophila melanogaster. These data show that the polyadenylation signal is highly conserved in human and fly. In addition, de novo motif searches generated a refined description of the U-rich downstream sequence (DSE) element, which shows more divergence between the two species. These refined motifs are applied, within a Hidden Markov Model (HMM) framework, to predict mRNA cleavage sites. CONCLUSION: We demonstrate that the DSE is a specific motif in both human and Drosophila. These findings shed light on the sequence correlates of a highly conserved biological process, and improve in silico prediction of 3' mRNA cleavage and polyadenylation sites.
Assuntos
Drosophila melanogaster/genética , Poli A/genética , Poliadenilação/genética , Regiões 3' não Traduzidas/genética , Animais , Composição de Bases/genética , Sequência de Bases , Etiquetas de Sequências Expressas , Humanos , Modelos Genéticos , Processamento Pós-Transcricional do RNA , RNA Mensageiro/genéticaRESUMO
BACKGROUND: In studies of gene regulation the efficient computational detection of over-represented transcription factor binding sites is an increasingly important aspect. Several published methods can be used for testing whether a set of hypothesised co-regulated genes share a common regulatory regime based on the occurrence of the modelled transcription factor binding sites. However there is little or no information available for guiding the end users choice of method. Furthermore it would be necessary to obtain several different software programs from various sources to make a well-founded choice. METHODOLOGY: We introduce a software package, Asap, for fast searching with position weight matrices that include several standard methods for assessing over-representation. We have compared the ability of these methods to detect over-represented transcription factor binding sites in artificial promoter sequences. Controlling all aspects of our input data we are able to identify the optimal statistics across multiple threshold values and for sequence sets containing different distributions of transcription factor binding sites. CONCLUSIONS: We show that our implementation is significantly faster than more naïve scanning algorithms when searching with many weight matrices in large sequence sets. When comparing the various statistics, we show that those based on binomial over-representation and Fisher's exact test performs almost equally good and better than the others. An online server is available at http://servers.binf.ku.dk/asap/.
Assuntos
Algoritmos , Sítios de Ligação , Modelos Estatísticos , Fatores de Transcrição , Software , Fatores de TempoRESUMO
Many testis-specific genes from the sex chromosomes are subject to rapid evolution, which can make it difficult to identify murine genes in the human genome. The murine CYPT gene family includes 15 members, but orthologs were undetectable in the human genome. However, using refined homology search, sequences corresponding to the shared promoter region of the CYPT family were identified at 39 loci. Most loci were located immediately upstream of genes belonging to the VCX/Y, SPANX, or CSAG gene families. Sequence comparison of the loci revealed a conserved CYPT promoter-like (CPL) element featuring TATA and CCAAT boxes. The expression of members of the three families harboring the CPL resembled the murine expression of the CYPT family, with weak expression in late pachytene spermatocytes and predominant expression in spermatids, but some genes were also weakly expressed in somatic cells and in other germ cell types. The genomic regions harboring the gene families were rich in direct and inverted segmental duplications (SD), which may facilitate gene conversion and rapid evolution. The conserved CPL and the common expression profiles suggest that the human VCX/Y, SPANX, and CSAG2 gene families together with the murine SPANX gene and the CYPT family may share a common ancestor. Finally, we present evidence that VCX/Y and SPANX may be paralogs with a similar protein structure consisting of C terminal acidic repeats of variable lengths.
Assuntos
Antígenos de Neoplasias/genética , Evolução Molecular , Família Multigênica , Proteínas de Neoplasias/genética , Proteínas Nucleares/genética , Regiões Promotoras Genéticas , Proteínas/genética , Sequência de Aminoácidos , Animais , Sequência de Bases , Sequência Conservada , Primers do DNA , Etiquetas de Sequências Expressas , Humanos , Camundongos , Dados de Sequência MolecularRESUMO
Remorins form a superfamily of plant-specific plasma membrane/lipid-raft-associated proteins of unknown structure and function. Using specific antibodies, we localized tomato remorin 1 to apical tissues, leaf primordia and vascular traces. The deduced remorin protein sequence contains a predicted coiled coil-domain, suggesting its participation in protein-protein interactions. Circular dichroism revealed that recombinant potato remorin contains an alpha-helical region that forms a functional coiled-coil domain. Electron microscopy of purified preparations of four different recombinant remorins, one from potato, two divergent isologs from tomato, and one from Arabidopsis thaliana , demonstrated that the proteins form highly similar filamentous structures. The diameters of the negatively-stained filaments ranged from 4.6-7.4 nm for potato remorin 1, 4.3-6.2 nm for tomato remorin 1, 5.7-7.5 nm for tomato remorin 2, and 5.7-8.0 nm for Arabidopsis Dbp. Highly polymerized remorin 1 was detected in glutaraldehyde-crosslinked tomato plasma membrane preparations and a population of the protein was immunolocalized in tomato root tips to structures associated with discrete regions of the plasma membrane.
Assuntos
Proteínas de Transporte/análise , Meristema/química , Fosfoproteínas/análise , Proteínas de Plantas/análise , Plantas/química , Sequência de Aminoácidos , Proteínas de Transporte/genética , Proteínas de Transporte/ultraestrutura , Dicroísmo Circular , Immunoblotting , Solanum lycopersicum/química , Solanum lycopersicum/genética , Proteínas de Membrana/análise , Proteínas de Membrana/genética , Meristema/genética , Microscopia Confocal , Microscopia Eletrônica , Dados de Sequência Molecular , Peso Molecular , Oligopeptídeos/análise , Oligopeptídeos/genética , Fosfoproteínas/genética , Fosfoproteínas/ultraestrutura , Proteínas de Plantas/genética , Proteínas de Plantas/ultraestrutura , Raízes de Plantas/química , Raízes de Plantas/genética , Brotos de Planta/química , Brotos de Planta/genética , Plantas/embriologia , Plantas/genética , Proteínas Recombinantes/análise , Solanum tuberosum/química , Solanum tuberosum/genéticaRESUMO
BACKGROUND: HIV-1-infected patients vary considerably by their response to antiretroviral treatment, drug concentrations in plasma, toxic events, and rate of immune recovery. This variability could have a genetic basis. We did a pharmacogenetics study to analyse the association between response to antiretroviral treatment and allelic variants of several genes. METHODS: In 123 patients, we did PCR analyses of the gene for the multidrug-resistance transporter (MDR1), which codes for P-glycoprotein, of genes coding for isoenzymes of cytochrome P450, CYP3A4, CYP3A5, CYP2D6, and CYP2C19, and of the gene for the chemokine receptor CCR5. We measured concentrations in plasma of the antiretroviral agents efavirenz and nelfinavir by high-performance liquid-chromatography, and measured levels of P-glycoprotein expression, CD4-cell count, and HIV-1 viraemia. FINDINGS: Median drug concentrations in patients with the MDR1 3435 TT, CT, and CC genotypes were at the 30th, 50th, and 75th percentiles, respectively (p=0.0001). In patients with CYP2D6 extensive-metaboliser or poor-metaboliser alleles, median drug concentrations were at percentiles 45 and 62.5, respectively (p=0.04). Patients with the MDR1 TT genotype 6 months after starting treatment had a greater rise in CD4-cell count (257 cells/microL) than patients with the CT (165 cells/microL) and CC (121 cells/microL) genotype (p=0.0048), and the best recovery of naïve CD4-cells. INTERPRETATION: The polymorphism MDR1 3435 C/T predicts immune recovery after initiation of antiretroviral treatment. This finding suggests that P-glycoprotein has an important role in admittance of antiretroviral drugs to restricted compartments in vivo.