Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
J Mol Biol ; 220(3): 659-71, 1991 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-1870126

RESUMO

We have developed an algorithm that automatically and reproducibly identifies potential tRNA genes in genomic DNA sequences, and we present a general strategy for testing the sensitivity of such algorithms. This algorithm is useful for the flagging and characterization of long genomic sequences that have not been experimentally analyzed for identification of functional regions, and for the scanning of nucleotide sequence databases for errors in the sequences and the functional assignments associated with them. In an exhaustive scan of the GenBank database, 97.5% of the 744 known tRNA genes were correctly identified (true-positives), and 42 previously unidentified sequences were predicted to be tRNAs. A detailed analysis of these latter predictions reveals that 16 of the 42 are very similar to known tRNA genes, and we predict that they do, in fact, code for tRNA, yielding a false-positive rate for the algorithm of 0.003%. The new algorithm and testing strategy are a considerable improvement over any previously described strategies for recognizing tRNA genes, and they allow detections of genes (including introns) embedded in long genomic sequences.


Assuntos
DNA/genética , Genes , Modelos Genéticos , RNA de Transferência/genética , Algoritmos , Animais , Anticódon/genética , Sequência de Bases , Dados de Sequência Molecular , Conformação de Ácido Nucleico
2.
J Mol Biol ; 287(3): 467-84, 1999 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-10092453

RESUMO

We have undertaken the inventory and assembly of the ATP binding cassette (ABC) transporter systems in the complete genome of Bacillus subtilis. We combined the identification of the three protein partners that compose an ABC transporter (nucleotide-binding domain, NBD; membrane spanning domain, MSD; and solute-binding protein, SBP) with constraints on the genetic organization. This strategy allowed the identification of 86 NBDs in 78 proteins, 103 MSD proteins and 37 SBPs. The analysis of transcriptional units allows the reconstruction of 59 ABC transporters, which include at least one NBD and one MSD. A particular class of five dimeric ATPases was not associated to MSD partners and is assumed to be involved either in macrolide resistance or regulation of translation elongation. In addition, we have detected five genes encoding ATPases without any gene coding for MSD protein in their neighborhood and 11 operons that encode only the membrane and solute-binding proteins. On the bases of similarities, three ATP-binding proteins are proposed to energize ten incomplete systems, suggesting that one ATPase may be recruited by more than one transporter. Finally, we estimate that the B. subtilis genome encodes for at least 78 ABC transporters that have been split in 38 importers and 40 extruders. The ABC systems have been further classified into 11 sub-families according to the tree obtained from the NBDs and the clustering of the MSDs and the SBPs. Comparisons with Escherichia coli show that the extruders are over-represented in B. subtilis, corresponding to an expansion of the sub-families of antibiotic and drug resistance systems.


Assuntos
Transportadores de Cassetes de Ligação de ATP/classificação , Bacillus subtilis/metabolismo , Transportadores de Cassetes de Ligação de ATP/genética , Transportadores de Cassetes de Ligação de ATP/metabolismo , Bacillus subtilis/genética , Sítios de Ligação/genética , Evolução Molecular , Genoma Bacteriano , Substâncias Macromoleculares , Membranas/metabolismo , Nucleotídeos/metabolismo
3.
Hum Mol Genet ; 1(4): 259-67, 1992 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-1303196

RESUMO

The exon positions located at the 5' and 3' splice sites are involved in two functions: in the accurate removal of introns from the nuclear pre-messenger RNAs and in coding for amino acids. Therefore, at least two constraints will act on the exon positions: the splicing constraint and the protein constraint. In the present study we investigate the effect of those constraints on a set of splice sites extracted from GenBank. The consensus matrices computed for each intron location in the reading frame present striking differences at the exon positions of the 5' splice sites. The results obtained can not be explained by the action of a single constraint but rather by the competition between the splicing and protein constraints. Out of eight sites corresponding to codons located in the vicinity of the intron, three present an amino acid distribution that differs greatly from the average amino acid composition of the proteins. Each of these three sites can be characterized by specific amino acids. Results show that the splicing constraint has an effect on the local amino acid composition of the protein as long as the function of the protein is not disrupted.


Assuntos
Proteínas/genética , Sequência de Aminoácidos , Aminoácidos/análise , Animais , Sequência de Bases , Códon/genética , Sequência Consenso , DNA/genética , Éxons , Humanos , Íntrons , Dados de Sequência Molecular , Proteínas/química , Splicing de RNA/genética , RNA Nuclear Pequeno/genética , Especificidade da Espécie
4.
Comput Appl Biosci ; 3(4): 287-95, 1987 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-3134115

RESUMO

Protein coding regions of a genome fragment can be mathematically predicted by studying variations in the statistical properties or by searching the signals characteristic of the junctions between the coding and non-coding regions. We propose here a new statistical method using correspondence analysis. This method does not use any reference codon set but takes into account the codon usage homogeneity along the studied genome fragment. Comparison with previously published methods especially the 'codon usage method' of Staden has been made, and two examples are presented here. Applications to analysis of prokaryotic operon and eukaryotic split genes are also discussed. Use of the method has also shown two structures not previously described: i) in the human prt gene, a strong triplet structure exists in a non-coding region; ii) in the human tp-a codon usage is not uniform between the different exons.


Assuntos
Códon , Computação Matemática , Proteínas/genética , RNA Mensageiro , Algoritmos , Animais , Sequência de Bases , Caenorhabditis/genética , Escherichia coli/genética , Éxons , Humanos , Hipoxantina Fosforribosiltransferase/genética , Íntrons , Dados de Sequência Molecular , Ativador de Plasminogênio Tecidual/genética
5.
J Mol Microbiol Biotechnol ; 2(4): 501-4, 2000 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-11075924

RESUMO

We present the first release of a database devoted to the ATP-binding cassette (ABC) protein domains (ABCdb). The ABC proteins are involved in a wide variety of physiological processes in Archea, Bacteria and Eucaryota where they are encoded by large families of paralogous genes. The majority of ABC domains energize the transport of compounds across the membranes. In bacteria, ABC transporters are involved in the uptake of a wide range of molecules and in mechanisms of virulence and antibiotic resistance. In eukaryotes, most of them are involved in drug resistance and in human cells, many are associated with diseases. Sequence analysis reveals that members of the ABC superfamily can be organized into sub-families and suggests that they have diverged from common ancestral forms. In this release, ABCdb includes the inventory and assembly of the ABC transporter systems of completely sequenced genomes. In addition to the protein entries, the database comprises information on functional domains, sequence motifs, predicted trans-membrane segments, and signal peptides. It also includes a classification in sub-families of the ABC systems as well as a classification of the different partners of the systems. Evolutionary trees and specific sequence patterns are provided for each sub-family. The database is endowed with a powerful query system and it was interfaced with blastP2 program for similarity searches. ABCdb has been developed in the ACeDB format, a database system developed by Jean Thierry-Mieg and Richard Durbin. ABCdb can be accessed via the World Wide Web (http://ir2lcb.cnrs-mrs.fr/ABCdb/).


Assuntos
Transportadores de Cassetes de Ligação de ATP/química , Bases de Dados como Assunto , Animais , Archaea/fisiologia , Fenômenos Fisiológicos Bacterianos , Células Eucarióticas/fisiologia , Humanos , Software
6.
Nucleic Acids Res ; 23(15): 2900-8, 1995 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-7659513

RESUMO

During the determination of DNA sequences, frameshift errors are not the most frequent but they are the most bothersome as they corrupt the amino acid sequence over several residues. Detection of such errors by sequence alignment is only possible when related sequences are found in the databases. To avoid this limitation, we have developed a new tool based on the distribution of non-overlapping 3-tuples or 6-tuples in the three frames of an ORF. The method relies upon the result of a correspondence analysis. It has been extensively tested on Bacillus subtilis and Saccharomyces cerevisiae sequences and has also been examined with human sequences. The results indicate that it can detect frameshift errors affecting as few as 20 bp with a low rate of false positives (no more than 1.0/1000 bp scanned). The proposed algorithm can be used to scan a large collection of data, but it is mainly intended for laboratory practice as a tool for checking the quality of the sequences produced during a sequencing project.


Assuntos
Algoritmos , Proteínas de Bactérias , Análise Mutacional de DNA , Mutação da Fase de Leitura , Sequência de Aminoácidos , Bacillus subtilis/genética , Sequência de Bases , DNA Bacteriano/genética , DNA Fúngico/genética , Análise Discriminante , Humanos , Dados de Sequência Molecular , Peptídeo Sintases/genética , Fases de Leitura/genética , Saccharomyces cerevisiae/genética
7.
J Theor Biol ; 166(1): 51-61, 1994 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-8145561

RESUMO

We have developed a fast filtering method for searching repetitive sequences in databases that allows the simultaneous identification of different families of repetitive elements during the same scanning. It discriminates between repetitive elements and non-related sequences by comparing the frequencies of k-words found in both groups of sequences. The distance used to sort out the sequences is based on a weighting of the k-words, which is obtained by performing a correspondence analysis on learning sets of correctly chosen sequences. The identification of Alu elements in human sequences is given as an illustration of the method. The Alu sequences are divided in four distinct groups of elements: the left and right monomers located on the direct and on the complementary strands. The results obtained on the test sets show that a very good discrimination is achieved with a word length of 6 b.p. Indeed, only 0.5% of the non-Alu sequences were incorrectly predicted as Alu elements for a threshold value allowing the identification of all Alu monomers. The misclassification of the different Alu monomers (1.4%) in the four groups of examples occurs only when the left and the right monomers are in the same orientation. Moreover, during the scanning of 63 GenBank sequences longer than 10 Kb, all the Alu elements were correctly identified (616 elements) and only a few non-Alu sequences were wrongly predicted as Alu elements (22 fragments). There is a real need for this kind of method since most of the repetitive elements are not annotated in the database entries. This method can then be used for a systematic screening of new sequences before their insertion in databases. It can also allow the creation of specific databases devoted to repetitive elements, which is a required step for any further analysis of those elements.


Assuntos
Algoritmos , Bases de Dados Factuais , Sequências Repetitivas de Ácido Nucleico , Animais , Sequência de Bases , Matemática , Dados de Sequência Molecular
8.
J Mol Evol ; 26(3): 198-204, 1987.
Artigo em Inglês | MEDLINE | ID: mdl-3129567

RESUMO

The compositional distribution of coding sequences from five vertebrates (Xenopus, chicken, mouse, rat, and human) is shifted toward higher GC values compared to that of the DNA molecules (in the 35-85-kb size range) isolated from the corresponding genomes. This shift is due to the lower GC levels of intergenic sequences compared to coding sequences. In the cold-blooded vertebrate, the two distributions are similar in that GC-poor genes and GC-poor DNA molecules are largely predominant. In contrast, in the warm-blooded vertebrates, GC-rich genes are largely predominant over GC-poor genes, whereas GC-poor DNA molecules are largely predominant over GC-rich DNA molecules. As a consequence, the genomes of warm-blooded vertebrates show a compositional gradient of gene concentration. The compositional distributions of coding sequences (as well as of DNA molecules) showed remarkable differences between chicken and mammals, and between mouse (or rat) and human. Differences were also detected in the compositional distribution of housekeeping and tissue-specific genes, the former being more abundant among GC-rich genes.


Assuntos
Evolução Biológica , Genes , Proto-Oncogenes , Vertebrados/genética , Animais , Composição de Bases , Galinhas/genética , DNA/genética , Haplorrinos/genética , Humanos , Camundongos/genética , Xenopus laevis/genética
9.
Yeast ; 12(11): 1163-78, 1996 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-8896282

RESUMO

The authors of the first yeast chromosome sequence defined a minimum threshold requirement of 100 codons, above which an open reading frame (ORF) is retained as a putative coding sequence. However, at least 58 yeast genes shorter than 100 codons have an assigned protein function. Therefore, the yeast genome may contain other tiny but functionally important genes that are discarded from analyses by this simple filtering rule. We have established discriminant functions from the in-phase hexamer frequencies of functional genes and of simulated ORFs derived from a stationary Markov chain model. Fifty-two out of the 58 genes were recognized as coding ORFs by our discriminating method. The test was also applied to all the small ORFs (36 to 100 codons) found in the intergenic regions of published chromosomes. It retained 140 new potential tiny coding sequences, among which we identified seven new genes by similarity searches. Our method, used conjointly with similarity searches, can also highlight sequencing errors resulting from the disruption of the coding frame of longer ORFs. This method, by its ability to detect potential coding ORFs, can be a very useful tool for functional analysis.


Assuntos
Genoma Fúngico , Saccharomyces cerevisiae/genética , Sequência de Bases , Computadores , Dados de Sequência Molecular , Fases de Leitura Aberta
10.
Comput Chem ; 23(3-4): 209-17, 1999 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-10404616

RESUMO

The prediction of coding sequences has received a lot of attention during the last decade. We can distinguish two kinds of methods, those that rely on training with sets of example and counter-example sequences, and those that exploit the intrinsic properties of the DNA sequences to be analyzed. The former are generally more powerful but their domains of application are limited by the availability of a training set. The latter avoid this drawback but can only be applied to sequences that are long enough to allow computation of the statistics. Here, we present a method that fills the gap between the two approaches. A learning step is applied using a set of sequences that are assumed to contain coding and non-coding regions, but with the boundaries of these regions unknown. A test step then uses the discriminant function obtained during the learning to predict coding regions in sequences from the same organism. The learning relies upon a correspondence analysis and prediction is presented on a graphical display. The method has been evaluated on a sample of yeast sequences, and the analysis of a set of expressed sequence tags from the Eucalyptus globulus-Pisolithus tinctorius ectomycorrhiza illustrates the relevance of the approach in its biological context.


Assuntos
Bases de Dados Factuais , Proteínas/genética , Algoritmos , Sequência de Aminoácidos , Etiquetas de Sequências Expressas , Dados de Sequência Molecular , Proteínas/química , Homologia de Sequência de Aminoácidos
11.
J Mol Microbiol Biotechnol ; 2(2): 179-89, 2000 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-10939242

RESUMO

The recently identified bacterial Tat pathway is capable of exporting proteins with a peculiar twin-arginine signal peptide in folded conformation independently of the Sec machinery. It is structurally and mechanistically similar to the delta pH-dependent pathway used for importing chloroplast proteins into the thylakoid. The tat genes are not ubiquitously present and are absent from half of the completely sequenced bacterial genomes. The presence of the tat genes seems to correlate with genome size and with the presence of important enzymes with a twin-arginine signal peptide. A minimal Tat system requires a copy of tatA and a copy of tatC. The composition and gene order of a tat locus are generally conserved within the same taxonomy group but vary considerably to other groups, which would exclude an acquisition of the Tat system by recent horizontal gene transfer. The tat genes are also found in the genomes of chloroplasts and plant mitochondria but are absent from animal mitochondrial genomes. The topology of evolution trees suggests a bacterial origin of the Tat system. In general, the twin-arginine signal peptide is capable of targeting any passenger protein to the Tat pathway. However, a structural signal carried by the mature part of a passenger protein can override targeting information in a signal peptide under certain circumstances. Tat systems show a substrate-Tat component specificity and a species specificity. The pore size of the Tat channel is estimated as being between 5 and 9 nm. Operational models of the Tat system are proposed.


Assuntos
Bactérias/genética , Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Proteínas de Escherichia coli , Produtos do Gene tat/metabolismo , Proteínas de Membrana Transportadoras , Arginina/metabolismo , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Evolução Biológica , Transporte Biológico Ativo , Proteínas de Transporte/química , Proteínas de Transporte/genética , Proteínas de Transporte/metabolismo , Produtos do Gene tat/química , Produtos do Gene tat/genética , Genes Bacterianos , Genes tat , Modelos Moleculares , Óperon , Sinais Direcionadores de Proteínas/química , Sinais Direcionadores de Proteínas/genética , Sinais Direcionadores de Proteínas/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA