Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 21(4): 545-7, 2005 Feb 15.
Article in English | MEDLINE | ID: mdl-15374859

ABSTRACT

UNLABELLED: Correspondence analysis of codon usage data is a widely used method in sequence analysis, but the variability in amino acid composition between proteins is a confounding factor when one wants to analyse synonymous codon usage variability. A simple and natural way to cope with this problem is to use within-group correspondence analysis. There is, however, no user-friendly implementation of this method available for genomic studies. Our motivation was to provide to the community a Web facility to easily study synonymous codon usage on a subset of data available in public genomic databases. AVAILABILITY: Availability through the Pole Bioinformatique Lyonnais (PBIL) Web server at http://pbil.univ-lyon1.fr/datasets/charif04/ with a demo allowing us to reproduce the figure in the present application note. All underlying software is distributed under a GPL licence. CONTACT: http://pbil.univ-lyon1.fr/members/lobry.


Subject(s)
Algorithms , Codon/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods , Software , User-Computer Interface , Computer Graphics , Online Systems
2.
J Clin Microbiol ; 41(4): 1785-7, 2003 Apr.
Article in English | MEDLINE | ID: mdl-12682188

ABSTRACT

BIBI was designed to automate DNA sequence analysis for bacterial identification in the clinical field. BIBI relies on the use of BLAST and CLUSTAL W programs applied to different subsets of sequences extracted from GenBank. These sequences are filtered and stored in a new database, which is adapted to bacterial identification.


Subject(s)
Bacteria/classification , Computational Biology/methods , Software , Bacteria/genetics , Bacterial Infections/microbiology , Databases, Genetic , Databases, Nucleic Acid , Humans , Phylogeny , Sequence Analysis, DNA
4.
FEMS Microbiol Lett ; 197(1): 111-6, 2001 Apr 01.
Article in English | MEDLINE | ID: mdl-11287155

ABSTRACT

The actinomycete Frankia has never been transformed genetically. To favour the development of Frankia cloning vectors, we have fully sequenced the Frankia alni pFQ31 cryptic plasmid and performed analyses to characterise its coding and non-coding regions. This plasmid is 8551 bp-long and contains 72% G+C. Computer-assisted analyses identified 18 open reading frames (ORFs). These ORFs show a synonymous codon usage different from the one of Frankia chromosomal genes, suggesting an evolutionary bias linked to the nature of the replicon or a horizontal transfer. Three ORFs were found to encode genes likely to be involved in plasmid replication and stability: parFA (partition protein), ptrFA (transcriptional repressor of the GntR family) and repFA (initiation of replication). DNA signatures of a replication origin were identified in the ptrFA-repFA intergenic region. These structural motifs are similar to those observed among origins of iteron-containing plasmids replicating via a θ mode.


Subject(s)
Actinomycetales/genetics , Nitrogen Fixation , Plasmids/genetics , Symbiosis , Actinomycetales/growth & development , Codon/genetics , DNA Replication , Molecular Sequence Data , Open Reading Frames/genetics , Replication Origin , Sequence Analysis, DNA
5.
Genome Inform ; 12: 155-64, 2001.
Article in English | MEDLINE | ID: mdl-11791234

ABSTRACT

It has been claimed that complete genome sequences would clarify phylogenetic relationships between organisms but, up to now, no satisfying approach has been proposed to use efficiently these data. For instance, if the coding of presence or absence of genes in complete genomes gives interesting results, it does not take into account the phylogenetic information contained in sequences and ignores hidden paralogy by using a similarity-based definition of orthology. Also, concatenation of sequences of different genes takes hardly in consideration the specific evolutionary rate of each gene. At last, building a consensus tree is strongly limited by the low number of genes shared among all organisms. Here, we use a new method based on supertree construction, which permits to cumulate in one supertree the information and statistical support of hundreds of trees from orthologous gene families and to build the phylogeny of 33 prokaryotes and four eukaryotes with completely sequenced genomes. This approach gives a robust supertree, which demonstrates that a phylogeny of prokaryotic species is conceivable and challenges the hypothesis of a thermophilic origin of bacteria and present-day life. The results are compatible with the hypothesis of a core of genes for which lateral transfers are rare but they raise doubts on the widely admitted "complexity hypothesis" which predicts that this core is mainly implicated in informational processes.


Subject(s)
Bacteria/classification , Bacteria/genetics , Phylogeny , Computational Biology , Gene Transfer, Horizontal , Genome, Bacterial , Genomics/statistics & numerical data , Models, Genetic
6.
Genome Res ; 10(3): 379-85, 2000 Mar.
Article in English | MEDLINE | ID: mdl-10720578

ABSTRACT

We present here HOBACGEN, a database system devoted to comparative genomics in bacteria. HOBACGEN contains all available protein genes from bacteria, archaea, and yeast, taken from SWISS-PROT/TrEMBL and classified into families. It also includes multiple alignments and phylogenetic trees built from these families. The database is organized under a client/server architecture with a client written in Java, which may run on any platform. This client integrates a graphical interface allowing users to select families according to various criteria and notably to select homologs common to a given set of taxa. This interface also allows users to visualize multiple alignments and trees associated to families. In tree displays, protein gene names are colored according to the taxonomy of the corresponding organisms. Users may access all information associated to sequences and multiple alignments by clicking on genes. This graphic tool thus gives a rapid and simple access to all data required to interpret homology relationships between genes and distinguish orthologs from paralogs. Instructions for installation of the client or the server are available at http://pbil.univ-lyon1. fr/databases/hobacgen.html.


Subject(s)
Databases, Factual , Genetics, Microbial , Genome, Bacterial , Sequence Homology, Amino Acid , Software , Amino Acid Sequence , Base Sequence , Internet , Molecular Sequence Data , Sequence Alignment
7.
Nucleic Acids Res ; 28(1): 68-71, 2000 Jan 01.
Article in English | MEDLINE | ID: mdl-10592183

ABSTRACT

As the number of complete microbial genomes publicly available is still growing, the problem of annotation quality in these very large sequences remains unsolved. Indeed, the number of annotations associated with complete genomes is usually lower than those of the shorter entries encountered in the repository collections. Moreover, classical sequence database management systems have difficulties in handling entries of such size. In this context, the Enhanced Microbial Genomes Library (EMGLib) was developed to try to alleviate these problems. This library contains all the complete genomes from prokaryotes (bacteria and archaea) already sequenced and the yeast genome in GenBank format. The annotations are improved by the introduction of data on codon usage, gene orientation on the chromosome and gene families. It is possible to access EMGLib through two database systems set up on WWW servers: the PBIL server at http://pbil.univ-lyon1.fr/emglib.html and the MICADO server at http://locus.jouy.inra.fr/micado


Subject(s)
Databases, Factual , Genome, Archaeal , Genome, Bacterial , Genome, Fungal , Genome, Protozoan , Base Sequence , Molecular Sequence Data , User-Computer Interface
8.
Bioinformatics ; 15(5): 424-5, 1999 May.
Article in English | MEDLINE | ID: mdl-10366663

ABSTRACT

SUMMARY: JaDis is a Java application for computing evolutionary distances between nucleic acid sequences and G+C base frequencies. It allows specific comparison of coding sequences, of non-coding sequences or of a non-coding sequence with coding sequences. AVAILABILITY: http://pbil.univ-lyon1.fr/software/jadis.html


Subject(s)
Sequence Analysis, DNA/methods , Software , Base Composition
9.
Nucleic Acids Res ; 27(1): 63-5, 1999 Jan 01.
Article in English | MEDLINE | ID: mdl-9847143

ABSTRACT

Since the obtention of the complete sequence of Haemophilus influenzae Rd in 1995, the number of bacterial genomes entirely sequenced has regularly increased. A problem is that the quality of the annotations of these very large sequences is usually lower than those of the shorter entries encountered in the repository collections. Moreover, classical sequence database management systems have difficulties in handling entries of that size. In this context, we have decided to build the Enhanced Microbial Genomes Library (EMGLib) in which these two problems are alleviated. This library contains all the complete genomes from bacteria already sequenced and the yeast genome in GenBank format. The annotations are improved by the introduction of data on codon usage, gene orientation on the chromosome and gene families. It is possible to access EMGLib through two database systems set up on World Wide Web servers: the PBIL server at http://pbil.univ-lyon1.fr/emglib/emglib. html and the MICADO server at http://locus.jouy.inra.fr/micado


Subject(s)
Databases, Factual , Genome, Bacterial , Genome, Fungal , Genomic Library , Base Sequence , Codon/genetics , Databases, Factual/trends , Genes/genetics , Information Storage and Retrieval , Internet , Molecular Sequence Data , User-Computer Interface
10.
Nucleic Acids Res ; 26(1): 60-2, 1998 Jan 01.
Article in English | MEDLINE | ID: mdl-9399801

ABSTRACT

The non-redundant Bacillus subtilis database (NRSub) has been developed in the context of the sequencing project devoted to this bacterium. As this project has reached completion, the whole genome is now available as a single contig. Thanks to the ACNUC database management system and its associated retrieval system Query_win, each functional region of the genome can be accessed individually. Extra annotations have been added such as accession numbers for the genes, locations on the genetic map, codon adaptation index values, as well as cross-references with other collections. NRSub is distributed through anonymous FTP as a text file in EMBL format and as an ACNUC database. It is also possible to access NRSub through two dedicated World Wide Web servers located in France (http://acnuc. univ-lyon1.fr/nrsub/nrsub.html ) and in Japan (http://ddbjs4h.genes. nig.ac.jp/ ).


Subject(s)
Bacillus subtilis/genetics , Databases, Factual , Computer Communication Networks , Databases, Factual/trends
11.
Nucleic Acids Res ; 25(1): 53-6, 1997 Jan 01.
Article in English | MEDLINE | ID: mdl-9016504

ABSTRACT

In the context of the international project aiming at sequencing the whole genome of Bacillus subtilis we have developed NRSub, a non-redundant database of sequences from this organism. Starting from the B.subtilis sequences available in the repository collections we have removed all encountered duplications, then we have added extra annotations to the sequences (e.g. accession numbers for the genes, locations on the genetic map, codon usage index). We have also added cross-references with EMBL/GenBank/DDBJ, MEDLINE, SWISS-PROT and ENZYME databases. NRSub is distributed through anonymous FTP as a text file in EMBL format and as an ACNUC database. It is also possible to access the database through two dedicated World Wide Web servers located in France (http://acnuc.univ-lyon1.fr/nrsub/nrsub.++ +html ) and in Japan (http://ddbjs4h.genes.nig.ac.jp/ ).


Subject(s)
Bacillus subtilis/genetics , Base Sequence , Databases, Factual , Academies and Institutes , Computer Communication Networks , France
12.
Comput Appl Biosci ; 12(6): 507-10, 1996 Dec.
Article in English | MEDLINE | ID: mdl-9021269

ABSTRACT

LALNVIEW is a graphical program for visualising local alignments between two sequences (protein or nucleic acids). Sequences are represented by coloured rectangles to give an overall picture of their similarities. LALNVIEW can display sequence features (exon, intron, active site, domain, propeptide, etc.) along with the alignment. When using LALNVIEW through our Web servers, sequence features are automatically extracted from database annotations (SWISS-PROT, GenBank, EMBL or HOVERGEN) and displayed with the alignment. LALNVIEW is a useful tool for analysing pairwise sequence alignments and for making the link between sequence homology and what is known about the structure or function of sequences. LALNVIEW executables for UNIX, Macintosh and PC computers are freely available from our server (http:// expasy.hcuge.ch/sprot/lalnview.html).


Subject(s)
Computer Graphics , Sequence Alignment/methods , Software , Acyltransferases/genetics , Animals , Base Sequence , Computer Communication Networks , DNA/genetics , Databases, Factual , Epidermal Growth Factor/genetics , Evaluation Studies as Topic , Factor IX/genetics , Humans , Molecular Sequence Data , Proteins/genetics , Sequence Homology, Nucleic Acid
13.
Comput Appl Biosci ; 12(6): 519-24, 1996 Dec.
Article in English | MEDLINE | ID: mdl-9021271

ABSTRACT

This report describes two applications of a multivariate method for studying classes of nucleotide or protein sequences: correspondence discriminant analysis (CDA). The first example is the discrimination between Escherichia coli proteins according to their subcellular location (membrane, cytoplasm and periplasm). The high resolution of the method made it possible to predict the subcellular location of E.coli proteins for whom this information is not known. The second example is discrimination between the coding sequences of leading and lagging strands in four bacteria: Mycoplasma genitalium, Haemophilus influenzae, E.coli and Bacillus subtilis. The programs used for computing the analysis are integrated in a publicly available package that runs on MacOS 7.x or Windows 95 operating systems (http:/(/)biomserv.univ-lyonl.fr/ADE-4.html). These programs are also accessible through our World Wide Web server (http:/(/)biomserv.univ-lyonl.fr/Net Mul.html).


Subject(s)
DNA/genetics , Discriminant Analysis , Multivariate Analysis , Proteins/genetics , Software , Amino Acid Sequence , Bacillus subtilis/genetics , Bacterial Proteins/genetics , Codon/genetics , DNA, Bacterial/genetics , Escherichia coli/genetics , Haemophilus influenzae/genetics , Membrane Proteins/genetics , Molecular Sequence Data , Mycoplasma/genetics , Sequence Analysis, DNA , Species Specificity
14.
Comput Appl Biosci ; 12(1): 63-9, 1996 Feb.
Article in English | MEDLINE | ID: mdl-8670621

ABSTRACT

We have developed a World-Wide Web server for browsing sequence collections structured under the ACNUC format and for performing multivariate analyses on sequences. General collections (like GenBank or EMBL), as well as specialized data banks (like Hovergen and NRSub) can be accessed. This system allows complex queries to be constructed, and the result of each query, represented by a list of sequences, is stored on the server. It is then possible to reuse this list to compute multivariate analyses on the sequences. Two examples of applications are shown. The first one consists in a study of codon usage with correspondence analysis on all the protein genes of Haemophilus influenzae Rd. This study allows the highly expressed genes and the integral membrane proteins of this organism to be identified. The second one consists in an ordering of 70 aligned protein sequences of growth hormone with principal coordinate analysis. With this method, we are able to re-establish the patterns of relationships between the sequences previously determined with tree building programs.


Subject(s)
Computer Communication Networks , Databases, Factual , Molecular Biology/statistics & numerical data , Online Systems , Sequence Analysis/methods , Algorithms , Animals , Biometry , Codon/genetics , Evaluation Studies as Topic , Growth Hormone/genetics , Haemophilus influenzae/genetics , Multivariate Analysis , Sequence Analysis/statistics & numerical data
15.
Biochimie ; 78(5): 364-9, 1996.
Article in English | MEDLINE | ID: mdl-8905155

ABSTRACT

We have developed a World Wide Web (WWW) version of the sequence retrieval system Query: WWW-Query. This server allows to query nucleotide sequence banks in the EMBL/GenBank/DDBJ formats and protein sequence banks in the NBRF/PIR format. WWW-Query includes all the features of the on-line sequences browsers already available: possibility to build complex queries, integration of cross-references with different data banks, and access to the functional zones of biological interest. It also provides original services not available elsewhere: introduction of the notion of re-usable sequence lists, integration of dedicated helper applications for visualizing alignments and phylogenetic trees and links with multivariate methods for studying codon usage or for complementing phylogenies.


Subject(s)
Base Sequence , Computer Communication Networks , Information Systems , Molecular Sequence Data , Amino Acid Sequence , Phylogeny , Sequence Alignment , Sequence Homology, Amino Acid , Software
16.
Nucleic Acids Res ; 24(1): 41-5, 1996 Jan 01.
Article in English | MEDLINE | ID: mdl-8594597

ABSTRACT

In the context of the international project aimed at sequencing the whole genome of Bacillus subtilis we have developed a non-redundant, fully annotated database of sequences from this organism. Starting from the B.subtilis sequences available in the EMBL, GenBank and DDBJ collections we have removed all encountered duplications and then added extra annotations to the sequences (e.g. accession numbers for the genes, locations on the genetic map, codon usage, etc.) We have also added cross-references to the EMBL, MEDLINE, SWISS-PROT and ENZYME data banks. The present system results from merging of the NRSub and SubtiList databases and the sequence contigs used in the two systems are identical. NRSub is distributed as a flatfile in EMBL format (which is supported by most sequence analysis software packages) and as an ACNUC database, while SubtiList is distributed as a relational database under 4th Dimension. It is possible to access the data through two dedicated World Wide Web servers located in France and Japan.


Subject(s)
Bacillus subtilis/genetics , Databases, Factual , Genome, Bacterial , Base Sequence , Computer Communication Networks , Molecular Sequence Data
17.
Nucleic Acids Res ; 22(25): 5525-9, 1994 Dec 25.
Article in English | MEDLINE | ID: mdl-7838704

ABSTRACT

We have organized the DNA sequences of Bacillus subtillis from the EMBL collection to build the NRSub data base. This data base is free from duplications and all detected overlapping sequences are merged into contigs. Data on gene mapping and codon usage are also included. NRSub is publically available through anonymous FTP in flat file format or structured on the form of an ACNUC data base. Under this format, it is possible to use NRSub with the retrieval program Query--win. This program integrates a graphical interface and may be installed on any kind of UNX computer under X Window and on which the Vibrant and Motif libraries are available.


Subject(s)
Bacillus subtilis/genetics , DNA, Bacterial , Base Sequence , Codon , Computer Graphics , Databases, Factual , Genes, Bacterial , Molecular Sequence Data
18.
Biochimie ; 75(5): 415-22, 1993.
Article in English | MEDLINE | ID: mdl-8347728

ABSTRACT

ColiGene is an object-centered knowledge base for the study of gene expressivity in Escherichia coli by DNA sequence analysis. This system was developed with the knowledge base management system SHIRKA. Objects represented in ColiGene are biological structures such as genes or regulatory signals. They are organized in a hierarchical structure of classes, subclasses and instances. Navigation through the knowledge base and the building of queries are made using a graphical interface. The base is coupled with the data base ACNUC which structures a specialized collection of sequences: EcoSeq. Several tools are also associated to ColiGene, either for sequence analysis or for a more general purpose. Some biological results have been obtained using ColiGene which are summarized here.


Subject(s)
Artificial Intelligence , Escherichia coli/genetics , Gene Expression , Genome, Bacterial , Sequence Analysis, DNA , Database Management Systems , Databases, Factual , Salmonella typhimurium/genetics
19.
Article in English | MEDLINE | ID: mdl-7584353

ABSTRACT

The amount of biological sequences introduced in the general collections, and the growing complexity of the biological knowledge require the construction of models to formalize this knowledge and particularly the relationships between several data types. Two examples of such situations are presented here, they result from the biological research lead in our team in the field of molecular evolution. ColiGene is a modelling of E. coli genetics devoted to the analysis of relationships between genomic sequences and gene expressivity. MultiMap implements a new formalization of genome maps allowing manipulation of "maps of maps" in two species. Application of ColiGene and MultiMap are not restricted to molecular evolution and, for instance, MultiMap offers new capabilities for infering data on a genome from knowledge on another species. This could be essential for many mapping projects (human, mouse but also other mammals like pig). Development and implementation of those models have been done using an object-oriented knowledge base management system (SHIRKA) interfaced with a dedicated genomic data base management system (ACNUC). Graphical interfaces have been designed to give an environment similar to the biological representations used by biologists.


Subject(s)
Chromosome Mapping/methods , Sequence Analysis, DNA/methods , Software , Animals , Databases, Factual , Escherichia coli/genetics , Eukaryotic Cells , Gene Expression , Genome , Humans , Mice , Programming Languages , Prokaryotic Cells , Software Design , User-Computer Interface
20.
Article in English | MEDLINE | ID: mdl-7584356

ABSTRACT

Large scale genome sequencing projects are now producing hugh amounts of data which can be readily stored and managed within data base management systems, and analyzed using dedicated software packages. The results of these analyzes should also be stored with the input DNA sequences. The increasing complexity and size of the objects to be described and managed have led biologists to rely on advanced data models such as the object-oriented model. As a joint effort between our computer science and molecular biology research projects, the knowledge bases we have developed in molecular genetics have shown however that the basic object-oriented model is not fully adapted to the complexity of some biological situations encountered. Advanced descriptive capabilities, provided only by knowledge models originated from the AI field, are required. Composite or evolving objects, multiple viewpoints, constraints, tasks and methods, textual annotations are some examples of such capabilities. They are illustrated by biological situations for which they appeared to be necessary. Supporting powerful reasoning mechanisms (e.g. object classification, constraint propagation or qualitative simulators), they allow the development of large knowledge bases in molecular biology. These knowledge bases are expected to become the adequate support for co-operative distributed research efforts.


Subject(s)
Databases, Factual , Molecular Biology/methods , Sequence Analysis/methods , Artificial Intelligence , Software , Software Design
SELECTION OF CITATIONS
SEARCH DETAIL