Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Reproduction ; 138(2): 289-99, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19465487

RESUMO

Genome reprogramming is the ability of a nucleus to modify its epigenetic characteristics and gene expression pattern when placed in a new environment. Low efficiency of mammalian cloning is attributed to the incomplete and aberrant nature of genome reprogramming after somatic cell nuclear transfer (SCNT) in oocytes. To date, the aspects of genome reprogramming critical for full-term development after SCNT remain poorly understood. To identify the key elements of this process, changes in gene expression during maternal-to-embryonic transition in normal bovine embryos and changes in gene expression between donor cells and SCNT embryos were compared using a new cDNA array dedicated to embryonic genome transcriptional activation in the bovine. Three groups of transcripts were mostly affected during somatic reprogramming: endogenous terminal repeat (LTR) retrotransposons and mitochondrial transcripts were up-regulated, while genes encoding ribosomal proteins were downregulated. These unexpected data demonstrate specific categories of transcripts most sensitive to somatic reprogramming and likely affecting viability of SCNT embryos. Importantly, massive transcriptional activation of LTR retrotransposons resulted in similar levels of their transcripts in SCNT and fertilized embryos. Taken together, these results open a new avenue in the quest to understand nuclear reprogramming driven by oocyte cytoplasm.


Assuntos
Reprogramação Celular , Embrião de Mamíferos/fisiologia , Regulação da Expressão Gênica no Desenvolvimento , Genoma , Retroelementos/genética , Animais , Bovinos , Clonagem de Organismos , Desenvolvimento Embrionário/genética , Epigênese Genética , Fertilização , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Técnicas de Transferência Nuclear , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase Via Transcriptase Reversa
2.
Comput Chem ; 26(5): 511-9, 2002 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-12144179

RESUMO

In the framework of genome annotation, scientific literature is obviously the major source of biological knowledge. The aim of the work described in this paper is to exploit this source of data for the model plant Arabidopsis thaliana. The first step has consisted in constituting a relevant bibliographic references dataset for plant genomic research. Genes co-citations have then been systematically annotated in this reference dataset, starting from the simple idea that if genes are cited in the same publication, they must probably share some related functional properties. In order to deal with the synonymous gene name problem, a gene name reference list has been constituted starting from A. thaliana SwissProt entries. This list was used to build clusters of co-cited genes by a single linkage procedure such that any gene in a given cluster possesses at least one co-cited partner in the same cluster. Analysis of the clusters demonstrate the biological consistency of this approach, with only very few fortuitous links. As an example, a cluster including genes related to flowering time is more deeply described in the paper. Finally, a graphical representation of each cluster was performed, which provides a convenient way to retrieve the genes (the nodes of the graphs) and the references in which they were co-cited (the edges of the graphs). All the results can be accessed at the URL http://chlora.Igi.infobiogen.fr:1234/bib_arath/.


Assuntos
Arabidopsis/genética , Biologia Computacional/métodos , Bases de Dados Bibliográficas , Genoma de Planta , Mapeamento Físico do Cromossomo/métodos , Proteínas de Arabidopsis/genética , Análise por Conglomerados , Bases de Dados de Proteínas , Genes de Plantas/genética , Internet , Conhecimento , Dados de Sequência Molecular , Pesquisa , Especificidade da Espécie , Terminologia como Assunto
3.
Bioinformatics ; 18(3): 490-1, 2002 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-11934752

RESUMO

SUMMARY: GeneANOVA is an ANOVA-based software devoted to the analysis of gene expression data. AVAILABILITY: GeneANOVA is freely available on request for non-commercial use.


Assuntos
Análise de Variância , Expressão Gênica/genética , Variação Genética/genética , Software , Interface Usuário-Computador , Bases de Dados Genéticas , Análise de Sequência com Séries de Oligonucleotídeos
4.
J Comput Biol ; 8(4): 381-99, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11571074

RESUMO

We propose and study a new approach for the analysis of families of protein sequences. This method is related to the LogDet distances used in phylogenetic reconstructions; it can be viewed as an attempt to embed these distances into a multidimensional framework. The proposed method starts by associating a Markov matrix to each pairwise alignment deduced from a given multiple alignment. The central objects under consideration here are matrix-valued logarithms L of these Markov matrices, which exist under conditions that are compatible with fairly large divergence between the sequences. These logarithms allow us to compare data from a family of aligned proteins with simple models (in particular, continuous reversible Markov models) and to test the adequacy of such models. If one neglects fluctuations arising from the finite length of sequences, any continuous reversible Markov model with a single rate matrix Q over an arbitrary tree predicts that all the observed matrices L are multiples of Q. Our method exploits this fact, without relying on any tree estimation. We test this prediction on a family of proteins encoded by the mitochondrial genome of 26 multicellular animals, which include vertebrates, arthropods, echinoderms, molluscs, and nematodes. A principal component analysis of the observed matrices L shows that a single rate model can be used as a rough approximation to the data, but that systematic deviations from any such model are unmistakable and related to the evolutionary history of the species under consideration.


Assuntos
Biologia Computacional , Proteínas/genética , Alinhamento de Sequência/estatística & dados numéricos , Simulação por Computador , DNA Mitocondrial/genética , Evolução Molecular , Cadeias de Markov , Filogenia , Análise de Sequência de Proteína/estatística & dados numéricos , Processos Estocásticos
5.
Mol Biol Evol ; 18(7): 1231-45, 2001 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-11420363

RESUMO

Previous analyses of retroviral nucleotide sequences, suggest a so-called "scrambled duplicative stepwise molecular evolution" (many sectors with successive duplications/deletions of short and longer motifs) that could have stemmed from one or several starter tandemly repeated short sequence(s). In the present report, we tested this hypothesis by focusing on the long terminal repeats (LTRs) (and flanking sequences) of 24 human and 3 simian immunodeficiency viruses. By using a calculation strategy applicable to short sequences, we found consensus overrepresented motifs (often containing CTG or CAG) that were congruent with the previously defined "retroviral signature." We also show many local repetition patterns that are significant when compared with simply shuffled sequences. First- and second-order Markov chain analyses demonstrate that a major portion of the overrepresented oligonucleotides can be predicted from the dinucleotide compositions of the sequences, but by no means can biological mechanisms be deduced from these results: some of the listed local repetitions remain significant against dinucleotide-conserving shuffled sequences; together with previous results, this suggests that interspersed and/or local mononucleotide and oligonucleotide repetitions could have biased the dinucleotide compositions of the sequences. We searched for suggestive evolutionary patterns by scrutinizing a reliable multiple alignment of the 27 sequences. A manually constructed alignment based on homology blocks was in good agreement with the polypeptide alignment in the coding sectors and has been exhaustively assessed by using a multiplied alphabet obtained by the promising mathematical strategy called the N-block presentation (taking into account the environment of each nucleotide in a sequence). Sector by sector, we hypothesize many successive duplication/deletion scenarios that fit our previous evolutionary hypotheses. This suggests an important duplication/deletion role for the reverse transcriptase, particularly in inducing stuttering cryptic simplicity patterns.


Assuntos
Evolução Molecular , Repetição Terminal Longa de HIV , HIV-1/genética , HIV-2/genética , Algoritmos , Animais , Sequência de Bases , Sequência Consenso , DNA Viral/genética , Humanos , Modelos Genéticos , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Deleção de Sequência , Vírus da Imunodeficiência Símia/genética
6.
Genome Biol ; 2(6): RESEARCH0019, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11423008

RESUMO

BACKGROUND: In global gene expression profiling experiments, variation in the expression of genes of interest can often be hidden by general noise. To determine how biologically significant variation can be distinguished under such conditions we have analyzed the differences in gene expression when Bacillus subtilis is grown either on methionine or on methylthioribose as sulfur source. RESULTS: An unexpected link between arginine metabolism and sulfur metabolism was discovered, enabling us to identify a high-affinity arginine transport system encoded by the yqiXYZ genes. In addition, we tentatively identified a methionine/methionine sulfoxide transport system which is encoded by the operon ytmIJKLMhisP and is presumably used in the degradation of methionine sulfoxide to methane sulfonate for sulfur recycling. Experimental parameters resulting in systematic biases in gene expression were also uncovered. In particular, we found that the late competence operons comE, comF and comG were associated with subtle variations in growth conditions. CONCLUSIONS: Using variance analysis it is possible to distinguish between systematic biases and relevant gene-expression variation in transcriptome experiments. Co-variation of metabolic gene expression pathways was thus uncovered linking nitrogen and sulfur metabolism in B. subtilis.


Assuntos
Arginina/metabolismo , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Regulação Bacteriana da Expressão Gênica , Metionina/análogos & derivados , Metionina/metabolismo , Bacillus subtilis/crescimento & desenvolvimento , Perfilação da Expressão Gênica , Genes Bacterianos , Variação Genética , Proteínas de Membrana Transportadoras/genética , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Óperon , RNA Bacteriano/biossíntese , Enxofre/metabolismo , Tioglicosídeos/metabolismo
7.
Comput Chem ; 23(3-4): 317-31, 1999 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-10627144

RESUMO

The Z-value is an attempt to estimate the statistical significance of a Smith-Waterman dynamic alignment score (SW-score) through the use of a Monte-Carlo process. It partly reduces the bias induced by the composition and length of the sequences. This paper is not a theoretical study on the distribution of SW-scores and Z-values. Rather, it presents a statistical analysis of Z-values on large datasets of protein sequences, leading to a law of probability that the experimental Z-values follow. First, we determine the relationships between the computed Z-value, an estimation of its variance and the number of randomizations in the Monte-Carlo process. Then, we illustrate that Z-values are less correlated to sequence lengths than SW-scores. Then we show that pairwise alignments, performed on 'quasi-real' sequences (i.e., randomly shuffled sequences of the same length and amino acid composition as the real ones) lead to Z-value distributions that statistically fit the extreme value distribution, more precisely the Gumbel distribution (global EVD, Extreme Value Distribution). However, for real protein sequences, we observe an over-representation of high Z-values. We determine first a cutoff value which separates these overestimated Z-values from those which follow the global EVD. We then show that the interesting part of the tail of distribution of Z-values can be approximated by another EVD (i.e., an EVD which differs from the global EVD) or by a Pareto law. This has been confirmed for all proteins analysed so far, whether extracted from individual genomes, or from the ensemble of five complete microbial genomes comprising altogether 16956 protein sequences.


Assuntos
Genoma Bacteriano , Genoma Fúngico , Alinhamento de Sequência , Metodologias Computacionais , Escherichia coli/genética , Matemática , Método de Monte Carlo , Saccharomyces cerevisiae/genética
8.
FEMS Microbiol Rev ; 22(4): 207-27, 1998 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-9862121

RESUMO

The present article describes a genome database reviewing gene-related knowledge of two model bacteria, Bacillus subtilis and Escherichia coli. The database, Indigo, is open through the World-Wide Web (http://indigo.genetique.uvsq.fr). The concept used for organising the data, the concept of neighbourhood, allows one to explore the database content in an efficient although somewhat unusual way. Here, genes are related to each other by a variety of neighbourhoods, including proximity in the chromosome, phylogenetic kinship, participation in a common metabolic pathway, common presence in an article of the literature, or similar use of the genetic code. Several examples illustrate how this concept of neighbourhood permits one to review the available knowledge about a given gene or gene family, and elaborate unexpected, but revealing, analyses about gene functions.


Assuntos
Bacillus subtilis/genética , Bases de Dados como Assunto , Escherichia coli/genética , Genoma Bacteriano , Bacillus subtilis/classificação , Escherichia coli/classificação , Genes Bacterianos/genética , Ligases/genética , RNA de Transferência/classificação
9.
Electrophoresis ; 19(4): 515-27, 1998 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-9588797

RESUMO

Present availability of the genomic text of bacteria allows assignment of biological known functions to many genes (typically, half of the genome's gene content). It is now time to try and predict new unexpected functions, using inductive procedures that allow correlating the content of the genomic text to possible biological functions. We show here that analysis of the genomes of Escherichia coli and Bacillus subtilis for the distribution of AGCT motifs predicts that genes exist for which the mRNA molecule can be translated as several different proteins synthesized after ribosomal frameshifting or hopping. Among these genes we found that several coded for the same function in E. coli and B. subtilis. We analyzed in depth the situation of the infB gene (experimentally known to specify synthesis of several proteins differing in their translation starts), the aceF/pdhC gene, the eno gene, and the rplI gene. In addition, genes specific to E. coli were also studied: ompA, ompFand tolA (predicting epigenetic variation that could help escape infection by phages or colicins).


Assuntos
Bacillus subtilis/genética , Proteínas de Escherichia coli , Escherichia coli/genética , Mudança da Fase de Leitura do Gene Ribossômico , Genoma Bacteriano , Repetições de Microssatélites , Acetiltransferases/genética , Sequência de Aminoácidos , Proteínas da Membrana Bacteriana Externa/genética , Proteínas de Bactérias/genética , Sequência Consenso , Di-Hidrolipoil-Lisina-Resíduo Acetiltransferase , Gliceraldeído-3-Fosfato Desidrogenases/genética , Computação Matemática , Dados de Sequência Molecular , Fatores de Iniciação de Peptídeos/genética , Fosfopiruvato Hidratase/genética , Porinas/genética , Fator de Iniciação 2 em Procariotos , Complexo Piruvato Desidrogenase/genética , RNA Mensageiro , Proteínas Ribossômicas/genética , Homologia de Sequência de Aminoácidos
10.
Gene ; 209(1-2): GC1-GC38, 1998 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-9583944

RESUMO

In this paper, the relationship between codon usage and the physiological pattern of expression of a gene is investigated while considering a dataset of 815 nuclear genes of Arabidopsis thaliana. Factorial Correspondence Analysis, a commonly used multivariate statistical approach in codon usage analysis, was used in order to analyse codon usage bias gene by gene. The analysis reveals a single major trend in codon usage among genes in Arabidopsis. At one end of the trend lie genes with a highly G/C biased codon usage. This group contains mainly photosynthetic and housekeeping genes which are known to encode the most abundant proteins of the vegetal cell. At the other extreme lie genes with a weaker A/T-biased codon usage. This group contain genes with various functions which exhibits most of the time a strong tissue-specific pattern of expression in relation, for example, to stress conditions. These observations were confirmed by the detailed analysis of codon usage in the multigene family of tubulins and appear to be general in plant species, even as distant from Arabidopsis thaliana as a monocotyledonous plant such as maize.


Assuntos
Arabidopsis/genética , Códon/genética , Bases de Dados Factuais , Genes de Plantas , Composição de Bases , Sequência de Bases , Genoma de Planta , Proteínas de Plantas/genética
11.
DNA Res ; 4(4): 257-65, 1997 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-9405933

RESUMO

Analysis of the codon usage of genes coding for the structural components of the outer membrane in Escherichia coli, is consistent with the requirement for high expression of these genes. Because porins (which constitute the major protein component of the outer membrane), and LPS (which constitute the major outermost constituent of the outer membrane), are synthesized from genes displaying widely different codon usage, it is possible to investigate the origin of the outer membrane. The analysis predicts that the outer membrane might originate from a genome other than the genome coding for the major part of the cell. Such a special origin would explain in structural terms, the likely lethality of porins if they were inadvertently inserted within the inner membrane, giving rise to the Gram-negative bacterial type, having an envelope comprising two membranes, instead of a single cytoplasmic membrane and a murein sacculus.


Assuntos
Proteínas da Membrana Bacteriana Externa/genética , Códon , Escherichia coli/genética , Genoma Bacteriano , RNA de Transferência/genética
12.
Comput Appl Biosci ; 13(2): 131-6, 1997 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-9146959

RESUMO

MOTIVATION: Compression algorithms can be used to analyse genetic sequences. A compression algorithm tests a given property on the sequence and uses it to encode the sequence: if the property is true, it reveals some structure of the sequence which can be described briefly, this yields a description of the sequence which is shorter than the sequence of nucleotides given in extenso. The more a sequence is compressed by the algorithm, the more significant is the property for that sequence. RESULTS: We present a compression algorithm that tests the presence of a particular type of dosDNA (defined ordered sequence-DNA): approximate tandem repeats of small motifs (i.e. of lengths < 4). This algorithm has been experimented with on four yeast chromosomes. The presence of approximate tandem repeats seems to be a uniform structural property of yeast chromosomes.


Assuntos
Algoritmos , DNA/genética , Sequências Repetitivas de Ácido Nucleico , Sequência de Bases , Cromossomos Fúngicos/genética , DNA Fúngico/genética , Estudos de Avaliação como Assunto , Dados de Sequência Molecular , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Software
13.
J Mol Evol ; 44(2): 214-25, 1997 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-9069182

RESUMO

A computer-assisted analysis was made of 24 complete nucleotide sequences selected from the vertebrate retroviruses to represent the ten viral groups. The conclusions of this analysis extend and strengthen the previously made hypothesis on the Moloney murine leukemia virus: The evolution of the nucleotide sequence appears to have occurred mainly through at least three overlapping levels of duplication: (1) The distributions of overrepresented (3-6)-mers are consistent with the universal rule of a trend toward TG/CT excess and with the persistence of a certain degree of symmetry between the two strands of DNA. This suggests one or several original tandemly repeated sequences and some inverted duplications. (2) The existence of two general core consensuses at the level of these (3-6)-mers supports the hypothesis of a common evolutionary origin of vertebrate retroviruses. Consensuses more specific to certain sequences are compatible with phylogenetic trees established independently. The consensuses could correspond to intermediary evolutionary stages. (3) Most of the (3-6)-mers with a significantly higher than average frequency appear to be internally repeated (with monomeric or oligomeric internal iterations) and seem to be at least partly the cause of the bias observed by other researchers at the level of retroviral nucleotide composition. They suggest a third evolutionary stage by slippage-like stepwise local duplications.


Assuntos
Composição de Bases , Evolução Molecular , Sequências Repetitivas de Ácido Nucleico/genética , Retroviridae/genética , Animais , Sequência Consenso/genética , DNA Viral/química , DNA Viral/genética , Dados de Sequência Molecular , Oligodesoxirribonucleotídeos/genética , Filogenia , Vertebrados
14.
Genetica ; 100(1-3): 271-9, 1997.
Artigo em Inglês | MEDLINE | ID: mdl-9440280

RESUMO

We investigate the nucleotide sequences of 23 retroelements (4 mammalian retroviruses, 1 human, 3 yeast, 2 plant, and 13 invertebrate retrotransposons) in terms of their oligonucleotide composition in order to address the problem of relationship between retrotransposons and retroviruses, and the coadaptation of these retroelements to their host genomes. We have identified by computer analysis over-represented 3-through 6-mers in each sequence. Our results indicate retrotransposons are heterogeneous in contrast to retroviruses, suggesting different modes of evolution by slippage-like mechanisms. Moreover, we have calculated the Observed/Expected number ratio for each of the 256 tetramers and analysed the data using a multivariate approach. The tetramer composition of retroelement sequences appears to be influenced by host genomic factors like methylase activity.


Assuntos
Genoma , Retroelementos/genética , Retroviridae/genética , Animais , Sequência de Bases , Humanos , Dados de Sequência Molecular , Filogenia
16.
J Mol Biol ; 257(3): 574-85, 1996 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-8648625

RESUMO

This work reconsiders the GATC motif distribution in a 1.6 Mb segment of the Escherichia coli genome, compared to its distribution in phages and plasmids. At first sight the distribution of GATC words looks random. But when a realistic model of the chromosome (made of average genes having the same codon usage as in the real chromasome), is used as a theoretical reference, strong biasesare observed. GATC pairs such as GATCNNGATC are under-represented while there is a strong positive selection for motifs separated by 10, 19, 70 and 1100 bp. The last class is the only one present in E. coli parasites. It can be ascribed to the triggering sequences of the long-patch mismatch repair system. The 6 bp class overlaps with the consensus of CAP (catabolite activator protein) and FNR (fumarate/nitrate regulator) binding sites, thus accounting for counter-selection. The other classes, which could be targets for a nucleic acid-binding protein, are almost always present inside protein coding sequences, and are members of clusters of GATC motifs. Analysis of the genes containing these motifs suggests that they correspond to a regulatory process monitoring the shift from anaerobic to aerobic growth conditions. In particular this regulation, closing down transcription of a large number of genes involved in intermediary metabolism would be well suited for the cold and oxygen shift from the mammal's gut to the standard environmental conditions. In this process the methylation status of GATC clusters would be very important for tuning transcription, and a DNA binding protein, probably a member of the cold-shock proteins family would be needed for alleviating the effects mediated by slackening of the pace of methylation during the shift.


Assuntos
Bacteriófagos/genética , DNA Bacteriano/genética , Escherichia coli/genética , Oligonucleotídeos/genética , Plasmídeos/genética , Análise de Sequência de DNA , Sequência de Bases , Dados de Sequência Molecular
17.
J Mol Biol ; 250(2): 123-7, 1995 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-7608964

RESUMO

The availability of specialized sequence databanks for Escherichia coli, Saccharomyces cerevisiae and Bacillus subtilis made it possible to build a set of 105 protein-coding genes that are homologous in these three species. An analysis of the triplets at both the nucleotide and amino acid level revealed that the codon bias of some amino acids are significantly higher at conserved rather than at non-conserved positions. Comparisons of homologous genes in E. coli and Salmonella typhimurium, and in S. cerevisiae and Drosophila melanogaster, led to the same conclusion. A special case was made for serine in E. coli, whose major codon is AGC for non-conserved and TCC for conserved residues. We interpret this observation as evidence that the primordial codons for serine were TCN, while codons AGY appeared later. This conclusion is substantiated by an analysis of the codon usage of catalytic serine residues in ancient, ubiquitous and essential proteins (ATP synthases and topoisomerases). It is shown that in these proteins the proportion of the catalytic serine residues coded by TCN is significantly higher than the one expected from the overall codon usage of serine residues.


Assuntos
Evolução Biológica , Códon/genética , Sequência Conservada/genética , Código Genético/genética , Serina/genética , Sequência de Aminoácidos , Bacillus subtilis/genética , Sequência de Bases , Escherichia coli/genética , Saccharomyces cerevisiae/genética
18.
C R Acad Sci III ; 318(5): 599-608, 1995 May.
Artigo em Inglês | MEDLINE | ID: mdl-7671006

RESUMO

Complex genomes contain numerous simple sequence repeats, the biological significance of which remains obscure. Recently it has been shown that several human diseases are the result of changes in such sequences. Thus it has become urgent to undertake a systematic study of their properties. We have set the task of describing as completely as possible the set of sequences which contain bases organized according to symmetrical elements, the dosDNA: defined ordered sequence. Examination of local anomalies in dinucleotide composition serves to identify dosDNA zones in the genome. The study of chromosomes II, III, VIII and XI of Saccharomyces cerevisiae reveals these dosDNA zones comprise about 2% of the genome. They are regularly distributed along the chromosomes, regardless of the functional significance of the sequence. A more detailed analysis of dosDNA segments seems to indicate that simple repeats are the consequence of local properties of the chromosome, and not due to any motif in particular.


Assuntos
Cromossomos Fúngicos/genética , DNA Fúngico/química , Sequências Repetitivas de Ácido Nucleico , Saccharomyces cerevisiae/genética , Sequência de Bases , Variação Genética , Dados de Sequência Molecular
19.
Comput Appl Biosci ; 10(4): 401-8, 1994 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-7804872

RESUMO

A program for assembling sequences by using a global approach has been developed. By successive steps, a more and more precise classification of DNA fragments permits the positioning of the sequences on the contig; after having detected the pairs of overlapping sequences, groups are formed such that all sequences in a group overlap. Sequences common to several groups enable the groups to be ordered in a series. Ambiguities in the order of groups can arise at this stage, due to the presence of repeated fragments; different solutions are then proposed. Putting the groups into order leads to a preclassification of sequences. The fragments are then aligned by group, by searching for words common to all sequences in the group, and using an algorithm of dynamic programming. A detailed example on a set of nine sequences accompanies the description of the method.


Assuntos
Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Bases , DNA/genética , DNA Complementar/genética , Técnicas Genéticas , Dados de Sequência Molecular , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricos
20.
Microbiol Rev ; 57(3): 623-54, 1993 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-8246843

RESUMO

Several data libraries have been created to organize all the data obtained worldwide about the Escherichia coli genome. Because the known data now amount to more than 40% of the whole genome sequence, it has become necessary to organize the data in such a way that appropriate procedures can associate knowledge produced by experiments about each gene to its position on the chromosome and its relation to other relevant genes, for example. In addition, global properties of genes, affected by the introduction of new entries, should be present as appropriate description fields. A data base, implemented on Macintosh by using the data base management system 4th Dimension, is described. It is constructed around a core constituted by known contigs of E. coli sequences and links data collected in general libraries (unmodified) to data associated with evolving knowledge (with modifiable fields). Biologically significant results obtained through the coupling of appropriate procedures (learning or statistical data analysis) are presented. The data base is available through a 4th Dimension runtime and through FTP on Internet. It has been regularly updated and will be systematically linked to other E. coli data bases (M. Kroger, R. Wahl, G. Schachtel, and P. Rice, Nucleic Acids Res. 20(Suppl.):2119-2144, 1992; K. E. Rudd, W. Miller, C. Werner, J. Ostell, C. Tolstoshev, and S. G. Satterfield, Nucleic Acids Res. 19:637-647, 1991) in the near future.


Assuntos
Bases de Dados Factuais , Escherichia coli/genética , Genoma Bacteriano , Proteínas de Bactérias/genética , Sequência de Bases , Mapeamento Cromossômico , Cromossomos Bacterianos , Replicação do DNA , Apresentação de Dados , Sistemas de Gerenciamento de Base de Dados , Genes Bacterianos , Modelos Teóricos , Dados de Sequência Molecular , Transcrição Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA