Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Life (Basel) ; 6(3)2016 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-27527218

RESUMO

The existence of multiple copies of genes is a well-known phenomenon. A gene family is a set of sufficiently similar genes, formed by gene duplication. In earlier works conducted on a limited number of completely sequenced and annotated genomes it was found that size of gene family and size of genome are positively correlated. Additionally, it was found that several atypical microbes deviated from the observed general trend. In this study, we reexamined these associations on a larger dataset consisting of 1484 prokaryotic genomes and using several ranking approaches. We applied ranking methods in such a way that genomes with lower numbers of gene copies would have lower rank. Until now only simple ranking methods were used; we applied the Kemeny optimal aggregation approach as well. Regression and correlation analysis were utilized in order to accurately quantify and characterize the relationships between measures of paralog indices and genome size. In addition, boxplot analysis was employed as a method for outlier detection. We found that, in general, all paralog indexes positively correlate with an increase of genome size. As expected, different groups of atypical prokaryotic genomes were found for different types of paralog quantities. Mycoplasmataceae and Halobacteria appeared to be among the most interesting candidates for further research of evolution through gene duplication.

2.
Biol Direct ; 11(1): 2, 2016 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-26747447

RESUMO

BACKGROUND: The length of a protein sequence is largely determined by its function. In certain species, it may be also affected by additional factors, such as growth temperature or acidity. In 2002, it was shown that in the bacterium Escherichia coli and in the archaeon Archaeoglobus fulgidus, protein sequences with no homologs were, on average, shorter than those with homologs (BMC Evol Biol 2:20, 2002). It is now generally accepted that in bacterial and archaeal genomes the distributions of protein length are different between sequences with and without homologs. In this study, we examine this postulate by conducting a comprehensive analysis of all annotated prokaryotic genomes and by focusing on certain exceptions. RESULTS: We compared the distribution of lengths of "having homologs proteins" (HHPs) and "non-having homologs proteins" (orphans or ORFans) in all currently completely sequenced and COG-annotated prokaryotic genomes. As expected, the HHPs and ORFans have strikingly different length distributions in almost all genomes. As previously established, the HHPs, indeed are, on average, longer than the ORFans, and the length distributions for the ORFans have a relatively narrow peak, in contrast to the HHPs, whose lengths spread over a wider range of values. However, about thirty genomes do not obey these rules. Practically all genomes of Mycoplasma and Ureaplasma have atypical ORFans distributions, with the mean lengths of ORFan larger than the mean lengths of HHPs. These genera constitute over 80 % of atypical genomes. CONCLUSIONS: We confirmed on a ubiquitous set of genomes that the previous observation of HHPs and ORFans have different gene length distributions. We also showed that Mycoplasmataceae genomes have very distinctive distributions of ORFans lengths. We offer several possible biological explanations of this phenomenon, such as an adaptation to Mycoplasmataceae's ecological niche, specifically its "quiet" co-existence with host organisms, resulting in long ABC transporters.


Assuntos
Proteínas de Bactérias/metabolismo , Mycoplasmataceae/metabolismo , Proteínas de Bactérias/genética , Genoma Bacteriano/genética , Mycoplasmataceae/genética , Fases de Leitura Aberta/genética
3.
Biomed Res Int ; 2015: 786861, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26114113

RESUMO

Proteins of the same functional family (for example, kinases) may have significantly different lengths. It is an open question whether such variation in length is random or it appears as a response to some unknown evolutionary driving factors. The main purpose of this paper is to demonstrate existence of factors affecting prokaryotic gene lengths. We believe that the ranking of genomes according to lengths of their genes, followed by the calculation of coefficients of association between genome rank and genome property, is a reasonable approach in revealing such evolutionary driving factors. As we demonstrated earlier, our chosen approach, Bubble-sort, combines stability, accuracy, and computational efficiency as compared to other ranking methods. Application of Bubble Sort to the set of 1390 prokaryotic genomes confirmed that genes of Archaeal species are generally shorter than Bacterial ones. We observed that gene lengths are affected by various factors: within each domain, different phyla have preferences for short or long genes; thermophiles tend to have shorter genes than the soil-dwellers; halophiles tend to have longer genes. We also found that species with overrepresentation of cytosines and guanines in the third position of the codon (GC3 content) tend to have longer genes than species with low GC3 content.


Assuntos
Archaea/genética , Proteínas Arqueais/genética , Proteínas de Bactérias/genética , Evolução Molecular , Códon , Genoma Arqueal , Genoma Bacteriano
4.
Biomed Res Int ; 2013: 472163, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24371824

RESUMO

Ancestral sequence reconstruction is a well-known problem in molecular evolution. The problem presented in this study is inspired by sequence reconstruction, but instead of leaf-associated sequences we consider only their lengths. We call this problem ancestral gene length reconstruction. It is a problem of finding an optimal labeling which minimizes the total length's sum of the edges, where both a tree and nonnegative integers associated with corresponding leaves of the tree are the input. In this paper we give a linear algorithm to solve the problem on binary trees for the Manhattan cost function s(v, w) = |π(v) - π(w)|.


Assuntos
Sequência Conservada/genética , Evolução Molecular , Modelos Teóricos , Algoritmos , Análise de Sequência de DNA
6.
Comput Biol Chem ; 40: 20-9, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22940609

RESUMO

In this paper, we propose a method to classify prokaryotic genomes using the agglomerative information bottleneck method for unsupervised clustering. Although the method we present here is closely related to a group of methods based on detecting the presence or absence of genes, our method is different because it uses gene lengths as well. We show that this amended method is reliable. For robustness evaluation, we apply bootstrap and jackknife techniques to input data. As a result, we are able to propose an approach to determine the stability level of a cladogram. We demonstrate that the genome tree produced for a selected small group of genomes looks a lot like a phylogenetic tree of this group.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Genoma Bacteriano/genética , Algoritmos , Bactérias/classificação , Proteínas de Bactérias/genética , Crenarchaeota , Bases de Dados Genéticas , Filogenia
7.
Comput Biol Chem ; 40: 1-6, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22902951

RESUMO

Parvovirus B19 has an extreme tropism for human erythroid progenitors. Here we propose the hypothesis explaining the tropism of human parvovirus B19. Our speculations are based on experimental results related to the capsid proteins VP1 and VP2. These proteins were not detectable in nonpermissive cells in course of these experiments, although the corresponding mRNAs were synthesized. Our interpretation of these results is an inhibition of translation in nonpermissive cells by human miRNAs. We bring support to our hypothesis and propose detailed experimental procedure to test it.


Assuntos
MicroRNAs/metabolismo , Parvovirus B19 Humano/crescimento & desenvolvimento , Parvovirus B19 Humano/genética , RNA Mensageiro/antagonistas & inibidores , Sequência de Aminoácidos , Biologia Computacional , Especificidade de Hospedeiro , Humanos , Íntrons , MicroRNAs/genética , Dados de Sequência Molecular , Parvovirus B19 Humano/efeitos dos fármacos , RNA Mensageiro/genética
8.
Bioinform Biol Insights ; 6: 317-27, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23300345

RESUMO

In this paper we present a novel method for genome ranking according to gene lengths. The main outcomes described in this paper are the following: the formulation of the genome ranking problem, presentation of relevant approaches to solve it, and the demonstration of preliminary results from prokaryotic genomes ordering. Using a subset of prokaryotic genomes, we attempted to uncover factors affecting gene length. We have demonstrated that hyperthermophilic species have shorter genes as compared with mesophilic organisms, which probably means that environmental factors affect gene length. Moreover, these preliminary results show that environmental factors group together in ranking evolutionary distant species.

9.
Comput Biol Chem ; 33(4): 275-82, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19646927

RESUMO

The advancement in Escherichia coli genome research has made the information regarding transcription start sites of many genes available. A study relying on the availability of transcription start locations was performed. The first question addressed was what an average DNA curvature profile upstream of genes would look like when these genes are aligned by transcription start sites in comparison to alignment by translation start sites. Since it was hypothesized that curvature plays a role in transcription regulation, the expectation was that curvature measurements relative to transcription starts, rather than translation, should strengthen the signal. Our study justified this expectation. The second question aimed to clarify the relation between DNA curvature and promoter strength. Through clustering based on DNA curvature profiles along promoter regions, a strong positive correlation between the promoter strength and the curved DNA was found. The third question dealt with dinucleotide periodicity in E. coli to see whether a periodicity pattern specific to promoter regions exists. Such unknown pattern might shed new light on transcription regulation mechanisms in E. coli. A sequence periodicity of about 11 bp is characteristic to the whole E. coli genome, and is especially well-expressed in intergenic regions. Here it was shown that regions of the size of about 100-150 bp centered 70-100 bp upstream to transcription starts carry hidden periodicity with a period of about 10.3 bp.


Assuntos
Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , Regiões Promotoras Genéticas , Transcrição Gênica , DNA Intergênico , Regiões Terminadoras Genéticas
10.
Mol Ecol ; 18(9): 2063-75, 2009 May.
Artigo em Inglês | MEDLINE | ID: mdl-19344351

RESUMO

'Evolution Canyon' (ECI) at Lower Nahal Oren, Mount Carmel, Israel, is an optimal natural microscale model for unravelling evolution in action highlighting the twin evolutionary processes of adaptation and speciation. A major model organism in ECI is wild barley, Hordeum spontaneum, the progenitor of cultivated barley, which displays dramatic interslope adaptive and speciational divergence on the 'African' dry slope (AS) and the 'European' humid slope (ES), separated on average by 200 m. Here we examined interslope single nucleotide polymorphism (SNP) sequences and the expression diversity of the drought resistant dehydrin 1 gene (Dhn1) between the opposite slopes. We analysed 47 plants (genotypes), 4-10 individuals in each of seven stations (populations) in an area of 7000 m(2), for Dhn1 sequence diversity located in the 5' upstream flanking region of the gene. We found significant levels of Dhn1 genic diversity represented by 29 haplotypes, derived from 45 SNPs in a total of 708 bp sites. Most of the haplotypes, 25 out of 29 (= 86.2%), were represented by one genotype; hence, unique to one population. Only a single haplotype was common to both slopes. Genetic divergence of sequence and haplotype diversity was generally and significantly different among the populations and slopes. Nucleotide diversity was higher on the AS, whereas haplotype diversity was higher on the ES. Interslope divergence was significantly higher than intraslope divergence. The applied Tajima D rejected neutrality of the SNP diversity. The Dhn1 expression under dehydration indicated interslope divergent expression between AS and ES genotypes, reinforcing Dhn1 associated with drought resistance of wild barley at 'Evolution Canyon'. These results are inexplicable by mutation, gene flow, or chance effects, and support adaptive natural microclimatic selection as the major evolutionary divergent driving force.


Assuntos
Especiação Genética , Genética Populacional , Hordeum/genética , Proteínas de Plantas/genética , Clima , DNA de Plantas/genética , Secas , Ecossistema , Evolução Molecular , Genes de Plantas , Genótipo , Israel , Polimorfismo de Nucleotídeo Único , Seleção Genética , Alinhamento de Sequência , Análise de Sequência de DNA
11.
Comput Biol Chem ; 32(1): 17-28, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-17983838

RESUMO

BACKGROUND: Given a big sequence fragment or a set of functionally related sequences we consider two problems of a sequence analysis associated with the given sequence(s). The first problem is to measure sequence complexity (repetitiveness, compactness) to estimate how informative the set as a whole is. Usually an obtained measure should be compared with an appropriate random background calculated using permutation of the given sequences. We propose a novel and effective approach for background information measurement instead of the usual sequence reshuffling. The second problem is to detect a periodic bias to determine if it is one of the set features. Sequence periodicity, when sometimes one has in mind hidden periodicity, is a very basic genomic property. The sequence period of 3, which is considered to characterize coding sequences, and period 10-11, which may be due to the alternation of hydrophobic and hydrophilic amino acids, DNA curvature, and bendability were discovered and described. Searching for periodical biases brought significant results in the study of sequence-dependent nucleosome positioning: nucleosomal sites carry hidden period of about 10.4 bases. RESULTS: Calculated differences between genomic sequences and background showed high biological relevancy of the method that we proposed in this study. Our algorithm was applied to a few natural and artificial datasets. We constructed a simple "periodic" dataset by replacement of every tenth dinucleotide in each sequence of a trial set by the same dinucleotide "CC". We showed that the method reveals the introduced periodicity and that this periodical pattern carries higher information than in uninterrupted subsequences. An application of the method to the nucleosomal dataset revealed a weak pseudo-periodicity of 10.4 nucleotides confirming previous knowledge. An application of the method to Escherichia coli datasets revealed the well-known periodicity of 3bp as a genic attribute, a secondary genic period slightly larger than 11bp, and an intergenic period a bit smaller than 11bp. CONCLUSIONS: We reported a novel compositional complexity-based method for sequence analysis. We found that the difference between the sequence complexity of a natural sequence and of background is especially high for a set consisting exclusively of coding sequences. Hidden periodicities were found with no need of any preliminary assumptions regarding a composition of periodic elements. We illustrated the power of the method by studying the sets with known weak periodic properties: a nucleosomal database and sets of different regions of E. coli. We showed that the method conveniently indicated all kinds of periodicity and related features in these sets of DNA sequences.


Assuntos
Algoritmos , Bases de Dados de Ácidos Nucleicos , Proteínas de Escherichia coli/química , Periodicidade , Sequência de Bases , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão , Software
12.
Nucleic Acids Res ; 34(8): 2316-27, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16679450

RESUMO

It is known that DNA curvature plays a certain role in gene regulation. The distribution of curved DNA in promoter regions is evolutionarily preserved, and it is mainly determined by temperature of habitat. However, very little is known on the distribution of DNA curvature in termination sites. Our main objective was to comprehensively analyze distribution of curved sequences upstream and downstream to the coding genes in prokaryotic genomes. We applied CURVATURE software to 170 complete prokaryotic genomes in a search for possible typical distribution of DNA curvature around starts and ends of genes. Performing cluster analyses and other statistical tests, we obtained novel results regarding various factors influencing curvature distribution in intergenic regions, such as growth temperature, A+T composition and genome size. We also analyzed intergenic regions between converging genes in 15 selected genomes. The results show that six genomes presented peaks of curvature excess larger than 3 SDs. Insufficient statistics did not allow us to draw further conclusion. Our hypothesis is that DNA curvature could affect transcription termination in many prokaryotes either directly, through contacts with RNA polymerase, or indirectly, via contacts with some regulatory proteins.


Assuntos
DNA Arqueal/química , DNA Bacteriano/química , Regiões Promotoras Genéticas , Regiões Terminadoras Genéticas , Análise por Conglomerados , Genoma Arqueal , Genoma Bacteriano , Genômica , Conformação de Ácido Nucleico
13.
Biosystems ; 81(3): 208-22, 2005 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-15936870

RESUMO

With the availability of genome sequences, the possibility of new phylogenetic reconstructions arises in order to reveal genomic relationships among organisms. According to the compositional-spectra (CS) approach proposed in our previous studies, any genomic sequence can be characterized by a distribution of frequencies of imperfect matching of words (oligonucleotides). In the current application of CS-analysis, we attempted to analyze the cluster structure of genomes across life. It appeared that compositional spectra show a clear three-group clustering of the compared prokaryotic and eukaryotic genomes. Unexpectedly, this grouping seriously differs from the classical Universal Tree of Life structure represented by common kingdoms known as Eubacteria, Archaebacteria, and Eukarya. The revealed CS-clustering displays high stability, putatively reflecting its objective nature, and still enigmatic biological significance that may result from convergent evolution driven by ecological selection. We believe that our approach provides a new and wider (compared to traditional methods) perspective of extracting genomic information of high evolutionary relevance.


Assuntos
Classificação/métodos , Genoma/genética , Genômica/métodos , Oligonucleotídeos/genética , Filogenia , Composição de Bases , Sequência de Bases/genética , Análise por Conglomerados , Biologia Computacional/métodos , Especificidade da Espécie
14.
Nucleic Acids Res ; 32(19): 5907-15, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15528638

RESUMO

The centromere sequence parC of Escherichia coli low-copy-number plasmid R1 consists of two sets of 11 bp iterated sequences. Here we analysed the intrinsic sequence-directed curvature of parC by its migration anomaly in polyacrylamide gels. The 159 bp long parC is strongly curved with anomaly values (k-factors) close to 2. The properties of the parC curvature agree with those of other curved DNA sequences. parC contains two regions of 5-fold repeated iterons separated by 39 bp. We modified 4 bp within this intermediate sequence so that we could analyse the two 5-fold repeated regions independently. The analysis shows that the two repeat regions are not independently curved parts of parC but that the overall curvature is a property of the whole fragment. Since the centromere sequence of an E.coli plasmid as well as eukaryotic centromere sequences show DNA curvature, we speculate that curvature might be a general property of centromeres.


Assuntos
Centrômero/química , DNA Bacteriano/química , Escherichia coli/genética , Plasmídeos/química , Sequência de Bases , Eletroforese em Gel de Poliacrilamida , Dados de Sequência Molecular , Mutação , Conformação de Ácido Nucleico
15.
BMC Mol Biol ; 5: 14, 2004 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-15333140

RESUMO

BACKGROUND: Sequence periodicity with a period close to the DNA helical repeat is a very basic genomic property. This genomic feature was demonstrated for many prokaryotic genomes. The Escherichia coli sequences display the period close to 11 base pairs. RESULTS: Here we demonstrate that practically only ApA/TpT dinucleotides contribute to overall dinucleotide periodicity in Escherichia coli. The noncoding sequences reveal this periodicity much more prominently compared to protein-coding sequences. The sequence periodicity of ApC/GpT, ApT and GpC dinucleotides along the Escherichia coli K-12 is found to be located as well mainly within the intergenic regions. CONCLUSIONS: The observed concentration of the dinucleotide sequence periodicity in the intergenic regions of E. coli suggests that the periodicity is a typical property of prokaryotic intergenic regions. We suppose that this preferential distribution of dinucleotide periodicity serves many biological functions; first of all, the regulation of transcription.


Assuntos
DNA Intergênico/genética , Escherichia coli K12/genética , Periodicidade , Composição de Bases/genética , DNA Bacteriano/genética , Análise de Fourier , Genoma Bacteriano
16.
Genetics ; 166(3): 1437-50, 2004 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-15082561

RESUMO

Retroviruses and LTR retrotransposons comprise two long-terminal repeats (LTRs) bounding a central domain that encodes the products needed for reverse transcription, packaging, and integration into the genome. We describe a group of retrotransposons in 13 species and four genera of the grass tribe Triticeae, including barley, with long, approximately 4.4-kb LTRs formerly called Sukkula elements. The approximately 3.5-kb central domains include reverse transcriptase priming sites and are conserved in sequence but contain no open reading frames encoding typical retrotransposon proteins. However, they specify well-conserved RNA secondary structures. These features describe a novel group of elements, called LARDs or large retrotransposon derivatives (LARDs). These appear to be members of the gypsy class of LTR retrotransposons. Although apparently nonautonomous, LARDs appear to be transcribed and can be recombinationally mapped due to the polymorphism of their insertion sites. They are dispersed throughout the genome in an estimated 1.3 x 10(3) full-length copies and 1.16 x 10(4) solo LTRs, indicating frequent recombinational loss of internal domains as demonstrated also for the BARE-1 barley retrotransposon.


Assuntos
Sequência Conservada , Genoma de Planta , Hordeum/genética , Retroelementos , Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , Sequência de Bases , DNA de Plantas , Bases de Dados Factuais , Evolução Molecular , Hibridização in Situ Fluorescente , Elementos Nucleotídeos Longos e Dispersos , Dados de Sequência Molecular , Reação em Cadeia da Polimerase , Polimorfismo Genético , RNA de Plantas/química , Triticum/genética
17.
J Mol Evol ; 59(4): 520-7, 2004 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-15638463

RESUMO

The phenomenon of overlapping of various sequence messages in genomes is a puzzle for evolutionary theoreticians, geneticists, and sequence researchers. The overlapping is possible due to degeneracy of the messages, in particular, degeneracy of codons. It is often observed in organisms with a limited size of genome, possessing polymerases of low fidelity. The most accepted view considers the overlapping as a mechanism to increase the amount of information per unit length. Here we present a model that suggests direct evolutionary advantage of the message overlapping. Two opposing drives are considered: (a) reduction in the amount of vulnerable points when the overlapping of two messages involves common critical points and (b) cumulative compromising cost of coexistence of messages at the same site. Over a broad range of conditions the reduction of the target size prevails, thus making the overlapping of messages advantageous.


Assuntos
Códon/genética , Evolução Molecular , Modelos Genéticos , Animais , Sequência de Bases , Genoma Bacteriano , Genoma Viral , Humanos , Funções Verossimilhança , Dados de Sequência Molecular , Mutação , Alinhamento de Sequência
18.
In Silico Biol ; 4(3): 361-75, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15724286

RESUMO

DNA curvature is known to play a biological role in gene regulation, in particular, initiation of transcription. We applied the software CURVATURE based on the wedge model to predict whether promoter regions of certain prokaryotes may be characterized by higher intrinsic DNA curvature located within or upstream to these regions. The main purpose was to verify our earlier hypothesis that the DNA curvature plays a biological role in gene regulation in mesophilic as compared to hyperthermophilic prokaryotes, i.e., DNA curvature presumably has a functional adaptive significance determined by temperature selection. Therefore, we analyzed all available complete prokaryotic genomes. The analysis showed that there is a group of genomes with a relatively high average DNA curvature upstream of start of genes. Remarkably, all organisms of this group appeared to be mesophilic, which is a full confirmation of the former hypothesis. The conservative patterns of genomic curvature distribution across different mesophilic bacterial and archaeal genomes presented in this study provide a new, convincing indication that curved DNA is evolutionarily preserved and determined by temperature selection. Moreover, we found a rather peculiar property of hyperthermophilic prokaryotes: the coding regions are predicted to be significantly more curved than it would be expected from their dinucleotide composition.


Assuntos
DNA Arqueal/química , DNA Bacteriano/química , Genoma Arqueal , Genoma Bacteriano , Células Procarióticas , Conformação de Ácido Nucleico
19.
Acta Biotheor ; 51(2): 73-89, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-12870770

RESUMO

We introduce a novel, linguistic-like method of genome analysis. We propose a natural approach to characterizing genomic sequences based on occurrences of fixed length words from a predefined, sufficiently large set of words (strings over the alphabet [A, C, G, T]). A measure based on this approach is called compositional spectrum and is actually a histogram of imperfect word occurrences. Our results assert that the compositional spectrum is an overall characteristic of a long sequence i.e., a complete genome or an uninterrupted part of a chromosome. This attribute is manifested in the similarity of spectra obtained on different stretches of the same genome, and simultaneously in a broad range of dissimilarities between spectral representations of different genomes. High flexibility characterizes this approach due to imperfect matching and as a result sets of relatively long words can be considered. The proposed approach may have various applications in intra- and intergenomic sequence comparisons.


Assuntos
DNA/genética , Genômica , Análise de Sequência de DNA/métodos , Estatística como Assunto/métodos , Algoritmos , Animais , Archaea/genética , Composição de Bases , Sequência de Bases , Cromossomos/genética , DNA/química , Eubacterium/genética , Células Eucarióticas , Humanos , Linguística , Modelos Genéticos
20.
Nucleic Acids Res ; 31(14): 4192-200, 2003 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-12853637

RESUMO

The coexistence of multiple codes in the genome of human immunodeficiency virus type 1 (HIV-1) was analyzed. We explored factors constraining the variability of the virus genome primarily in relation to conserved RNA secondary structures overlapping coding sequences, and used a simple combination of algorithms for RNA secondary structure prediction based on the nearest-neighbor thermodynamic rules and a statistical approach. In our previous study, we applied this combination to a non- redundant data set of env nucleotide sequences, confirmed the conservative secondary structure of the rev-responsive element (RRE) and found a new RNA structure in the first conserved (C1) region of the env gene. In this study, we analyzed the variability of putative RNA secondary structures inside the nef gene of HIV-1 by applying these algorithms to a non-redundant data set of 104 nef sequences retrieved from the Los Alamos HIV database, and predicted the existence of a novel functional RNA secondary structure in the beta3/beta4 regions of nef. The predicted RNA fold in the beta3/beta4 region of nef appears in two forms with different loop sizes. The loop of the first fold consists of seven nucleotides (positions 494-500), with consensus UCAAGCU appearing in 79% of sequences. The other has a five-base loop (positions 495-499) with consensus CAAGC. The difference in size between these two loops may reflect the difference between respective counterparts in the hairpin recognition. This may also have an adaptive biological significance.


Assuntos
Produtos do Gene nef/genética , HIV-1/genética , Conformação de Ácido Nucleico , RNA Viral/química , Algoritmos , DNA Viral/genética , Humanos , RNA Viral/genética , Produtos do Gene nef do Vírus da Imunodeficiência Humana
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA