Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
Reproduction ; 138(2): 289-99, 2009 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-19465487

RESUMEN

Genome reprogramming is the ability of a nucleus to modify its epigenetic characteristics and gene expression pattern when placed in a new environment. Low efficiency of mammalian cloning is attributed to the incomplete and aberrant nature of genome reprogramming after somatic cell nuclear transfer (SCNT) in oocytes. To date, the aspects of genome reprogramming critical for full-term development after SCNT remain poorly understood. To identify the key elements of this process, changes in gene expression during maternal-to-embryonic transition in normal bovine embryos and changes in gene expression between donor cells and SCNT embryos were compared using a new cDNA array dedicated to embryonic genome transcriptional activation in the bovine. Three groups of transcripts were mostly affected during somatic reprogramming: endogenous terminal repeat (LTR) retrotransposons and mitochondrial transcripts were up-regulated, while genes encoding ribosomal proteins were downregulated. These unexpected data demonstrate specific categories of transcripts most sensitive to somatic reprogramming and likely affecting viability of SCNT embryos. Importantly, massive transcriptional activation of LTR retrotransposons resulted in similar levels of their transcripts in SCNT and fertilized embryos. Taken together, these results open a new avenue in the quest to understand nuclear reprogramming driven by oocyte cytoplasm.


Asunto(s)
Reprogramación Celular , Embrión de Mamíferos/fisiología , Regulación del Desarrollo de la Expresión Génica , Genoma , Retroelementos/genética , Animales , Bovinos , Clonación de Organismos , Desarrollo Embrionario/genética , Epigénesis Genética , Fertilización , Expresión Génica , Perfilación de la Expresión Génica/métodos , Técnicas de Transferencia Nuclear , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
2.
Comput Chem ; 26(5): 511-9, 2002 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-12144179

RESUMEN

In the framework of genome annotation, scientific literature is obviously the major source of biological knowledge. The aim of the work described in this paper is to exploit this source of data for the model plant Arabidopsis thaliana. The first step has consisted in constituting a relevant bibliographic references dataset for plant genomic research. Genes co-citations have then been systematically annotated in this reference dataset, starting from the simple idea that if genes are cited in the same publication, they must probably share some related functional properties. In order to deal with the synonymous gene name problem, a gene name reference list has been constituted starting from A. thaliana SwissProt entries. This list was used to build clusters of co-cited genes by a single linkage procedure such that any gene in a given cluster possesses at least one co-cited partner in the same cluster. Analysis of the clusters demonstrate the biological consistency of this approach, with only very few fortuitous links. As an example, a cluster including genes related to flowering time is more deeply described in the paper. Finally, a graphical representation of each cluster was performed, which provides a convenient way to retrieve the genes (the nodes of the graphs) and the references in which they were co-cited (the edges of the graphs). All the results can be accessed at the URL http://chlora.Igi.infobiogen.fr:1234/bib_arath/.


Asunto(s)
Arabidopsis/genética , Biología Computacional/métodos , Bases de Datos Bibliográficas , Genoma de Planta , Mapeo Físico de Cromosoma/métodos , Proteínas de Arabidopsis/genética , Análisis por Conglomerados , Bases de Datos de Proteínas , Genes de Plantas/genética , Internet , Conocimiento , Datos de Secuencia Molecular , Investigación , Especificidad de la Especie , Terminología como Asunto
3.
Bioinformatics ; 18(3): 490-1, 2002 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-11934752

RESUMEN

SUMMARY: GeneANOVA is an ANOVA-based software devoted to the analysis of gene expression data. AVAILABILITY: GeneANOVA is freely available on request for non-commercial use.


Asunto(s)
Análisis de Varianza , Expresión Génica/genética , Variación Genética/genética , Programas Informáticos , Interfaz Usuario-Computador , Bases de Datos Genéticas , Análisis de Secuencia por Matrices de Oligonucleótidos
4.
J Comput Biol ; 8(4): 381-99, 2001.
Artículo en Inglés | MEDLINE | ID: mdl-11571074

RESUMEN

We propose and study a new approach for the analysis of families of protein sequences. This method is related to the LogDet distances used in phylogenetic reconstructions; it can be viewed as an attempt to embed these distances into a multidimensional framework. The proposed method starts by associating a Markov matrix to each pairwise alignment deduced from a given multiple alignment. The central objects under consideration here are matrix-valued logarithms L of these Markov matrices, which exist under conditions that are compatible with fairly large divergence between the sequences. These logarithms allow us to compare data from a family of aligned proteins with simple models (in particular, continuous reversible Markov models) and to test the adequacy of such models. If one neglects fluctuations arising from the finite length of sequences, any continuous reversible Markov model with a single rate matrix Q over an arbitrary tree predicts that all the observed matrices L are multiples of Q. Our method exploits this fact, without relying on any tree estimation. We test this prediction on a family of proteins encoded by the mitochondrial genome of 26 multicellular animals, which include vertebrates, arthropods, echinoderms, molluscs, and nematodes. A principal component analysis of the observed matrices L shows that a single rate model can be used as a rough approximation to the data, but that systematic deviations from any such model are unmistakable and related to the evolutionary history of the species under consideration.


Asunto(s)
Biología Computacional , Proteínas/genética , Alineación de Secuencia/estadística & datos numéricos , Simulación por Computador , ADN Mitocondrial/genética , Evolución Molecular , Cadenas de Markov , Filogenia , Análisis de Secuencia de Proteína/estadística & datos numéricos , Procesos Estocásticos
5.
Genome Biol ; 2(6): RESEARCH0019, 2001.
Artículo en Inglés | MEDLINE | ID: mdl-11423008

RESUMEN

BACKGROUND: In global gene expression profiling experiments, variation in the expression of genes of interest can often be hidden by general noise. To determine how biologically significant variation can be distinguished under such conditions we have analyzed the differences in gene expression when Bacillus subtilis is grown either on methionine or on methylthioribose as sulfur source. RESULTS: An unexpected link between arginine metabolism and sulfur metabolism was discovered, enabling us to identify a high-affinity arginine transport system encoded by the yqiXYZ genes. In addition, we tentatively identified a methionine/methionine sulfoxide transport system which is encoded by the operon ytmIJKLMhisP and is presumably used in the degradation of methionine sulfoxide to methane sulfonate for sulfur recycling. Experimental parameters resulting in systematic biases in gene expression were also uncovered. In particular, we found that the late competence operons comE, comF and comG were associated with subtle variations in growth conditions. CONCLUSIONS: Using variance analysis it is possible to distinguish between systematic biases and relevant gene-expression variation in transcriptome experiments. Co-variation of metabolic gene expression pathways was thus uncovered linking nitrogen and sulfur metabolism in B. subtilis.


Asunto(s)
Arginina/metabolismo , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Regulación Bacteriana de la Expresión Génica , Metionina/análogos & derivados , Metionina/metabolismo , Bacillus subtilis/crecimiento & desarrollo , Perfilación de la Expresión Génica , Genes Bacterianos , Variación Genética , Proteínas de Transporte de Membrana/genética , Mutación , Análisis de Secuencia por Matrices de Oligonucleótidos , Operón , ARN Bacteriano/biosíntesis , Azufre/metabolismo , Tioglicósidos/metabolismo
6.
Mol Biol Evol ; 18(7): 1231-45, 2001 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-11420363

RESUMEN

Previous analyses of retroviral nucleotide sequences, suggest a so-called "scrambled duplicative stepwise molecular evolution" (many sectors with successive duplications/deletions of short and longer motifs) that could have stemmed from one or several starter tandemly repeated short sequence(s). In the present report, we tested this hypothesis by focusing on the long terminal repeats (LTRs) (and flanking sequences) of 24 human and 3 simian immunodeficiency viruses. By using a calculation strategy applicable to short sequences, we found consensus overrepresented motifs (often containing CTG or CAG) that were congruent with the previously defined "retroviral signature." We also show many local repetition patterns that are significant when compared with simply shuffled sequences. First- and second-order Markov chain analyses demonstrate that a major portion of the overrepresented oligonucleotides can be predicted from the dinucleotide compositions of the sequences, but by no means can biological mechanisms be deduced from these results: some of the listed local repetitions remain significant against dinucleotide-conserving shuffled sequences; together with previous results, this suggests that interspersed and/or local mononucleotide and oligonucleotide repetitions could have biased the dinucleotide compositions of the sequences. We searched for suggestive evolutionary patterns by scrutinizing a reliable multiple alignment of the 27 sequences. A manually constructed alignment based on homology blocks was in good agreement with the polypeptide alignment in the coding sectors and has been exhaustively assessed by using a multiplied alphabet obtained by the promising mathematical strategy called the N-block presentation (taking into account the environment of each nucleotide in a sequence). Sector by sector, we hypothesize many successive duplication/deletion scenarios that fit our previous evolutionary hypotheses. This suggests an important duplication/deletion role for the reverse transcriptase, particularly in inducing stuttering cryptic simplicity patterns.


Asunto(s)
Evolución Molecular , Duplicado del Terminal Largo de VIH , VIH-1/genética , VIH-2/genética , Algoritmos , Animales , Secuencia de Bases , Secuencia de Consenso , ADN Viral/genética , Humanos , Modelos Genéticos , Alineación de Secuencia/métodos , Alineación de Secuencia/estadística & datos numéricos , Eliminación de Secuencia , Virus de la Inmunodeficiencia de los Simios/genética
7.
Comput Chem ; 23(3-4): 317-31, 1999 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-10627144

RESUMEN

The Z-value is an attempt to estimate the statistical significance of a Smith-Waterman dynamic alignment score (SW-score) through the use of a Monte-Carlo process. It partly reduces the bias induced by the composition and length of the sequences. This paper is not a theoretical study on the distribution of SW-scores and Z-values. Rather, it presents a statistical analysis of Z-values on large datasets of protein sequences, leading to a law of probability that the experimental Z-values follow. First, we determine the relationships between the computed Z-value, an estimation of its variance and the number of randomizations in the Monte-Carlo process. Then, we illustrate that Z-values are less correlated to sequence lengths than SW-scores. Then we show that pairwise alignments, performed on 'quasi-real' sequences (i.e., randomly shuffled sequences of the same length and amino acid composition as the real ones) lead to Z-value distributions that statistically fit the extreme value distribution, more precisely the Gumbel distribution (global EVD, Extreme Value Distribution). However, for real protein sequences, we observe an over-representation of high Z-values. We determine first a cutoff value which separates these overestimated Z-values from those which follow the global EVD. We then show that the interesting part of the tail of distribution of Z-values can be approximated by another EVD (i.e., an EVD which differs from the global EVD) or by a Pareto law. This has been confirmed for all proteins analysed so far, whether extracted from individual genomes, or from the ensemble of five complete microbial genomes comprising altogether 16956 protein sequences.


Asunto(s)
Genoma Bacteriano , Genoma Fúngico , Alineación de Secuencia , Metodologías Computacionales , Escherichia coli/genética , Matemática , Método de Montecarlo , Saccharomyces cerevisiae/genética
8.
FEMS Microbiol Rev ; 22(4): 207-27, 1998 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-9862121

RESUMEN

The present article describes a genome database reviewing gene-related knowledge of two model bacteria, Bacillus subtilis and Escherichia coli. The database, Indigo, is open through the World-Wide Web (http://indigo.genetique.uvsq.fr). The concept used for organising the data, the concept of neighbourhood, allows one to explore the database content in an efficient although somewhat unusual way. Here, genes are related to each other by a variety of neighbourhoods, including proximity in the chromosome, phylogenetic kinship, participation in a common metabolic pathway, common presence in an article of the literature, or similar use of the genetic code. Several examples illustrate how this concept of neighbourhood permits one to review the available knowledge about a given gene or gene family, and elaborate unexpected, but revealing, analyses about gene functions.


Asunto(s)
Bacillus subtilis/genética , Bases de Datos como Asunto , Escherichia coli/genética , Genoma Bacteriano , Bacillus subtilis/clasificación , Escherichia coli/clasificación , Genes Bacterianos/genética , Ligasas/genética , ARN de Transferencia/clasificación
9.
Gene ; 209(1-2): GC1-GC38, 1998 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-9583944

RESUMEN

In this paper, the relationship between codon usage and the physiological pattern of expression of a gene is investigated while considering a dataset of 815 nuclear genes of Arabidopsis thaliana. Factorial Correspondence Analysis, a commonly used multivariate statistical approach in codon usage analysis, was used in order to analyse codon usage bias gene by gene. The analysis reveals a single major trend in codon usage among genes in Arabidopsis. At one end of the trend lie genes with a highly G/C biased codon usage. This group contains mainly photosynthetic and housekeeping genes which are known to encode the most abundant proteins of the vegetal cell. At the other extreme lie genes with a weaker A/T-biased codon usage. This group contain genes with various functions which exhibits most of the time a strong tissue-specific pattern of expression in relation, for example, to stress conditions. These observations were confirmed by the detailed analysis of codon usage in the multigene family of tubulins and appear to be general in plant species, even as distant from Arabidopsis thaliana as a monocotyledonous plant such as maize.


Asunto(s)
Arabidopsis/genética , Codón/genética , Bases de Datos Factuales , Genes de Plantas , Composición de Base , Secuencia de Bases , Genoma de Planta , Proteínas de Plantas/genética
10.
Electrophoresis ; 19(4): 515-27, 1998 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-9588797

RESUMEN

Present availability of the genomic text of bacteria allows assignment of biological known functions to many genes (typically, half of the genome's gene content). It is now time to try and predict new unexpected functions, using inductive procedures that allow correlating the content of the genomic text to possible biological functions. We show here that analysis of the genomes of Escherichia coli and Bacillus subtilis for the distribution of AGCT motifs predicts that genes exist for which the mRNA molecule can be translated as several different proteins synthesized after ribosomal frameshifting or hopping. Among these genes we found that several coded for the same function in E. coli and B. subtilis. We analyzed in depth the situation of the infB gene (experimentally known to specify synthesis of several proteins differing in their translation starts), the aceF/pdhC gene, the eno gene, and the rplI gene. In addition, genes specific to E. coli were also studied: ompA, ompFand tolA (predicting epigenetic variation that could help escape infection by phages or colicins).


Asunto(s)
Bacillus subtilis/genética , Proteínas de Escherichia coli , Escherichia coli/genética , Sistema de Lectura Ribosómico , Genoma Bacteriano , Repeticiones de Microsatélite , Acetiltransferasas/genética , Secuencia de Aminoácidos , Proteínas de la Membrana Bacteriana Externa/genética , Proteínas Bacterianas/genética , Secuencia de Consenso , Acetiltransferasa de Residuos Dihidrolipoil-Lisina , Gliceraldehído-3-Fosfato Deshidrogenasas/genética , Cómputos Matemáticos , Datos de Secuencia Molecular , Factores de Iniciación de Péptidos/genética , Fosfopiruvato Hidratasa/genética , Porinas/genética , Factor 2 Procariótico de Iniciación , Complejo Piruvato Deshidrogenasa/genética , ARN Mensajero , Proteínas Ribosómicas/genética , Homología de Secuencia de Aminoácido
11.
DNA Res ; 4(4): 257-65, 1997 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-9405933

RESUMEN

Analysis of the codon usage of genes coding for the structural components of the outer membrane in Escherichia coli, is consistent with the requirement for high expression of these genes. Because porins (which constitute the major protein component of the outer membrane), and LPS (which constitute the major outermost constituent of the outer membrane), are synthesized from genes displaying widely different codon usage, it is possible to investigate the origin of the outer membrane. The analysis predicts that the outer membrane might originate from a genome other than the genome coding for the major part of the cell. Such a special origin would explain in structural terms, the likely lethality of porins if they were inadvertently inserted within the inner membrane, giving rise to the Gram-negative bacterial type, having an envelope comprising two membranes, instead of a single cytoplasmic membrane and a murein sacculus.


Asunto(s)
Proteínas de la Membrana Bacteriana Externa/genética , Codón , Escherichia coli/genética , Genoma Bacteriano , ARN de Transferencia/genética
12.
Comput Appl Biosci ; 13(2): 131-6, 1997 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-9146959

RESUMEN

MOTIVATION: Compression algorithms can be used to analyse genetic sequences. A compression algorithm tests a given property on the sequence and uses it to encode the sequence: if the property is true, it reveals some structure of the sequence which can be described briefly, this yields a description of the sequence which is shorter than the sequence of nucleotides given in extenso. The more a sequence is compressed by the algorithm, the more significant is the property for that sequence. RESULTS: We present a compression algorithm that tests the presence of a particular type of dosDNA (defined ordered sequence-DNA): approximate tandem repeats of small motifs (i.e. of lengths < 4). This algorithm has been experimented with on four yeast chromosomes. The presence of approximate tandem repeats seems to be a uniform structural property of yeast chromosomes.


Asunto(s)
Algoritmos , ADN/genética , Secuencias Repetitivas de Ácidos Nucleicos , Secuencia de Bases , Cromosomas Fúngicos/genética , ADN de Hongos/genética , Estudios de Evaluación como Asunto , Datos de Secuencia Molecular , Saccharomyces cerevisiae/genética , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/estadística & datos numéricos , Programas Informáticos
13.
J Mol Evol ; 44(2): 214-25, 1997 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-9069182

RESUMEN

A computer-assisted analysis was made of 24 complete nucleotide sequences selected from the vertebrate retroviruses to represent the ten viral groups. The conclusions of this analysis extend and strengthen the previously made hypothesis on the Moloney murine leukemia virus: The evolution of the nucleotide sequence appears to have occurred mainly through at least three overlapping levels of duplication: (1) The distributions of overrepresented (3-6)-mers are consistent with the universal rule of a trend toward TG/CT excess and with the persistence of a certain degree of symmetry between the two strands of DNA. This suggests one or several original tandemly repeated sequences and some inverted duplications. (2) The existence of two general core consensuses at the level of these (3-6)-mers supports the hypothesis of a common evolutionary origin of vertebrate retroviruses. Consensuses more specific to certain sequences are compatible with phylogenetic trees established independently. The consensuses could correspond to intermediary evolutionary stages. (3) Most of the (3-6)-mers with a significantly higher than average frequency appear to be internally repeated (with monomeric or oligomeric internal iterations) and seem to be at least partly the cause of the bias observed by other researchers at the level of retroviral nucleotide composition. They suggest a third evolutionary stage by slippage-like stepwise local duplications.


Asunto(s)
Composición de Base , Evolución Molecular , Secuencias Repetitivas de Ácidos Nucleicos/genética , Retroviridae/genética , Animales , Secuencia de Consenso/genética , ADN Viral/química , ADN Viral/genética , Datos de Secuencia Molecular , Oligodesoxirribonucleótidos/genética , Filogenia , Vertebrados
14.
Genetica ; 100(1-3): 271-9, 1997.
Artículo en Inglés | MEDLINE | ID: mdl-9440280

RESUMEN

We investigate the nucleotide sequences of 23 retroelements (4 mammalian retroviruses, 1 human, 3 yeast, 2 plant, and 13 invertebrate retrotransposons) in terms of their oligonucleotide composition in order to address the problem of relationship between retrotransposons and retroviruses, and the coadaptation of these retroelements to their host genomes. We have identified by computer analysis over-represented 3-through 6-mers in each sequence. Our results indicate retrotransposons are heterogeneous in contrast to retroviruses, suggesting different modes of evolution by slippage-like mechanisms. Moreover, we have calculated the Observed/Expected number ratio for each of the 256 tetramers and analysed the data using a multivariate approach. The tetramer composition of retroelement sequences appears to be influenced by host genomic factors like methylase activity.


Asunto(s)
Genoma , Retroelementos/genética , Retroviridae/genética , Animales , Secuencia de Bases , Humanos , Datos de Secuencia Molecular , Filogenia
16.
J Mol Biol ; 257(3): 574-85, 1996 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-8648625

RESUMEN

This work reconsiders the GATC motif distribution in a 1.6 Mb segment of the Escherichia coli genome, compared to its distribution in phages and plasmids. At first sight the distribution of GATC words looks random. But when a realistic model of the chromosome (made of average genes having the same codon usage as in the real chromasome), is used as a theoretical reference, strong biasesare observed. GATC pairs such as GATCNNGATC are under-represented while there is a strong positive selection for motifs separated by 10, 19, 70 and 1100 bp. The last class is the only one present in E. coli parasites. It can be ascribed to the triggering sequences of the long-patch mismatch repair system. The 6 bp class overlaps with the consensus of CAP (catabolite activator protein) and FNR (fumarate/nitrate regulator) binding sites, thus accounting for counter-selection. The other classes, which could be targets for a nucleic acid-binding protein, are almost always present inside protein coding sequences, and are members of clusters of GATC motifs. Analysis of the genes containing these motifs suggests that they correspond to a regulatory process monitoring the shift from anaerobic to aerobic growth conditions. In particular this regulation, closing down transcription of a large number of genes involved in intermediary metabolism would be well suited for the cold and oxygen shift from the mammal's gut to the standard environmental conditions. In this process the methylation status of GATC clusters would be very important for tuning transcription, and a DNA binding protein, probably a member of the cold-shock proteins family would be needed for alleviating the effects mediated by slackening of the pace of methylation during the shift.


Asunto(s)
Bacteriófagos/genética , ADN Bacteriano/genética , Escherichia coli/genética , Oligonucleótidos/genética , Plásmidos/genética , Análisis de Secuencia de ADN , Secuencia de Bases , Datos de Secuencia Molecular
17.
J Mol Biol ; 250(2): 123-7, 1995 Jul 07.
Artículo en Inglés | MEDLINE | ID: mdl-7608964

RESUMEN

The availability of specialized sequence databanks for Escherichia coli, Saccharomyces cerevisiae and Bacillus subtilis made it possible to build a set of 105 protein-coding genes that are homologous in these three species. An analysis of the triplets at both the nucleotide and amino acid level revealed that the codon bias of some amino acids are significantly higher at conserved rather than at non-conserved positions. Comparisons of homologous genes in E. coli and Salmonella typhimurium, and in S. cerevisiae and Drosophila melanogaster, led to the same conclusion. A special case was made for serine in E. coli, whose major codon is AGC for non-conserved and TCC for conserved residues. We interpret this observation as evidence that the primordial codons for serine were TCN, while codons AGY appeared later. This conclusion is substantiated by an analysis of the codon usage of catalytic serine residues in ancient, ubiquitous and essential proteins (ATP synthases and topoisomerases). It is shown that in these proteins the proportion of the catalytic serine residues coded by TCN is significantly higher than the one expected from the overall codon usage of serine residues.


Asunto(s)
Evolución Biológica , Codón/genética , Secuencia Conservada/genética , Código Genético/genética , Serina/genética , Secuencia de Aminoácidos , Bacillus subtilis/genética , Secuencia de Bases , Escherichia coli/genética , Saccharomyces cerevisiae/genética
18.
C R Acad Sci III ; 318(5): 599-608, 1995 May.
Artículo en Inglés | MEDLINE | ID: mdl-7671006

RESUMEN

Complex genomes contain numerous simple sequence repeats, the biological significance of which remains obscure. Recently it has been shown that several human diseases are the result of changes in such sequences. Thus it has become urgent to undertake a systematic study of their properties. We have set the task of describing as completely as possible the set of sequences which contain bases organized according to symmetrical elements, the dosDNA: defined ordered sequence. Examination of local anomalies in dinucleotide composition serves to identify dosDNA zones in the genome. The study of chromosomes II, III, VIII and XI of Saccharomyces cerevisiae reveals these dosDNA zones comprise about 2% of the genome. They are regularly distributed along the chromosomes, regardless of the functional significance of the sequence. A more detailed analysis of dosDNA segments seems to indicate that simple repeats are the consequence of local properties of the chromosome, and not due to any motif in particular.


Asunto(s)
Cromosomas Fúngicos/genética , ADN de Hongos/química , Secuencias Repetitivas de Ácidos Nucleicos , Saccharomyces cerevisiae/genética , Secuencia de Bases , Variación Genética , Datos de Secuencia Molecular
19.
Comput Appl Biosci ; 10(4): 401-8, 1994 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-7804872

RESUMEN

A program for assembling sequences by using a global approach has been developed. By successive steps, a more and more precise classification of DNA fragments permits the positioning of the sequences on the contig; after having detected the pairs of overlapping sequences, groups are formed such that all sequences in a group overlap. Sequences common to several groups enable the groups to be ordered in a series. Ambiguities in the order of groups can arise at this stage, due to the presence of repeated fragments; different solutions are then proposed. Putting the groups into order leads to a preclassification of sequences. The fragments are then aligned by group, by searching for words common to all sequences in the group, and using an algorithm of dynamic programming. A detailed example on a set of nine sequences accompanies the description of the method.


Asunto(s)
Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Secuencia de Bases , ADN/genética , ADN Complementario/genética , Técnicas Genéticas , Datos de Secuencia Molecular , Alineación de Secuencia/métodos , Alineación de Secuencia/estadística & datos numéricos , Análisis de Secuencia de ADN/estadística & datos numéricos
20.
Microbiol Rev ; 57(3): 623-54, 1993 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-8246843

RESUMEN

Several data libraries have been created to organize all the data obtained worldwide about the Escherichia coli genome. Because the known data now amount to more than 40% of the whole genome sequence, it has become necessary to organize the data in such a way that appropriate procedures can associate knowledge produced by experiments about each gene to its position on the chromosome and its relation to other relevant genes, for example. In addition, global properties of genes, affected by the introduction of new entries, should be present as appropriate description fields. A data base, implemented on Macintosh by using the data base management system 4th Dimension, is described. It is constructed around a core constituted by known contigs of E. coli sequences and links data collected in general libraries (unmodified) to data associated with evolving knowledge (with modifiable fields). Biologically significant results obtained through the coupling of appropriate procedures (learning or statistical data analysis) are presented. The data base is available through a 4th Dimension runtime and through FTP on Internet. It has been regularly updated and will be systematically linked to other E. coli data bases (M. Kroger, R. Wahl, G. Schachtel, and P. Rice, Nucleic Acids Res. 20(Suppl.):2119-2144, 1992; K. E. Rudd, W. Miller, C. Werner, J. Ostell, C. Tolstoshev, and S. G. Satterfield, Nucleic Acids Res. 19:637-647, 1991) in the near future.


Asunto(s)
Bases de Datos Factuales , Escherichia coli/genética , Genoma Bacteriano , Proteínas Bacterianas/genética , Secuencia de Bases , Mapeo Cromosómico , Cromosomas Bacterianos , Replicación del ADN , Presentación de Datos , Sistemas de Administración de Bases de Datos , Genes Bacterianos , Modelos Teóricos , Datos de Secuencia Molecular , Transcripción Genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...