Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 36
Filter
Add more filters

Publication year range
1.
Ann Oncol ; 30(5): 757-765, 2019 05 01.
Article in English | MEDLINE | ID: mdl-30865223

ABSTRACT

BACKGROUND: Antitumor activity of molecular-targeted agents is guided by the presence of documented genomic alteration in specific histological subtypes. We aim to explore the feasibility, efficacy and therapeutic impact of molecular profiling in routine setting. PATIENTS AND METHODS: This multicentric prospective study enrolled adult or pediatric patients with solid or hematological advanced cancer previously treated in advanced/metastatic setting and noneligible to curative treatment. Each molecular profile was established on tumor, relapse or biopsies, and reviewed by a molecular tumor board (MTB) to identify molecular-based recommended therapies (MBRT). The main outcome was to assess the incidence rate of genomic mutations in routine setting, across specific histological types. Secondary objectives included a description of patients with actionable alterations and for whom MBRT was initiated, and overall response rate. RESULTS: Four centers included 2579 patients from February 2013 to February 2017, and the MTB reviewed the molecular profiles achieved for 1980 (76.8%) patients. The most frequently altered genes were CDKN2A (N = 181, 7%), KRAS (N = 177, 7%), PIK3CA (N = 185, 7%), and CCND1 (N = 104, 4%). An MBRT was recommended for 699/2579 patients (27%), and only 163/2579 patients (6%) received at least one MBRT. Out of the 182 lines of MBRT initiated, 23 (13%) partial responses were observed. However, only 0.9% of the whole cohort experienced an objective response. CONCLUSION: An MBRT was provided for 27% of patients in our study, but only 6% of patients actually received matched therapy with an overall response rate of 0.9%. Molecular screening should not be used at present to guide decision-making in routine clinical practice outside of clinical trials.This trial is registered with ClinicalTrials.gov, number NCT01774409.


Subject(s)
Mutation , Neoplasm Recurrence, Local/diagnosis , Neoplasms/diagnosis , Adult , Biomarkers, Tumor/genetics , Child , Databases, Genetic , Early Detection of Cancer/methods , Female , Humans , Male , Middle Aged , Neoplasm Metastasis , Neoplasm Recurrence, Local/drug therapy , Neoplasm Recurrence, Local/genetics , Neoplasms/drug therapy , Neoplasms/genetics , Neoplasms/pathology , Precision Medicine/methods , Prospective Studies
2.
Ann Oncol ; 28(8): 1934-1941, 2017 Aug 01.
Article in English | MEDLINE | ID: mdl-28460011

ABSTRACT

BACKGROUND: Never-smokers and never-drinkers patients (NSND) suffering from oral squamous cell carcinoma (OSCC) are epidemiologically different from smokers drinkers (SD). We therefore hypothesized that they harbored distinct targetable molecular alterations. PATIENTS AND METHODS: Data from The Cancer Genome Atlas (TCGA) (discovery set), Gene Expression Omnibus and Centre Léon Bérard (CLB) (three validation sets) with available gene expression profiles of HPV-negative OSCC from NSND and SD were mined. Protein expression profiles and genomic alterations were also analyzed from TCGA, and a functional pathway enrichment analysis was carried out. Formalin-fixed paraffin-embedded samples from 44 OSCC including 20 NSND and 24 SD treated at CLB were retrospectively collected to perform targeted-sequencing of 2559 transcripts (HTG EdgeSeq system), and CD3, CD4, CD8, IDO1, and PD-L1 expression analyses by immunohistochemistry (IHC). Enrichment of a six-gene interferon-γ signature of clinical response to pembrozulimab (PD-1 inhibitor) was evaluated in each sample from all cohorts, using the single sample gene set enrichment analysis method. RESULTS: A total of 854 genes and 29 proteins were found to be differentially expressed between NSND and SD in TCGA. Functional pathway analysis highlighted an overall enrichment for immune-related pathways in OSCC from NSND, especially involving T-cell activation. Interferon-γ response and PD1 signaling were strongly enriched in NSND. IDO1 and PD-L1 were overexpressed and the score of response to pembrolizumab was higher in NSND than in SD, although the mutational load was lower in NSND. IHC analyses in the CLB cohort evidenced IDO1 and PD-L1 overexpression in tumor cells that was associated with a higher rate of tumor-infiltrating T-cells in NSND compared with SD. CONCLUSION: The main biological and actionable difference between OSCC from NSND and SD lies in the immune microenvironment, suggesting a higher clinical benefit of PD-L1 and IDO1 inhibition in OSCC from NSND.


Subject(s)
B7-H1 Antigen/antagonists & inhibitors , Carcinoma, Squamous Cell/immunology , Indoleamine-Pyrrole 2,3,-Dioxygenase/antagonists & inhibitors , Mouth Neoplasms/immunology , Tumor Microenvironment , Aged , Alcohol Drinking , Alphapapillomavirus/isolation & purification , B7-H1 Antigen/genetics , Carcinoma, Squamous Cell/genetics , Carcinoma, Squamous Cell/virology , Cohort Studies , Female , Gene Expression Profiling , Humans , Indoleamine-Pyrrole 2,3,-Dioxygenase/genetics , Male , Middle Aged , Mouth Neoplasms/genetics , Mouth Neoplasms/virology , Smoking
3.
Blood Cancer J ; 7(4): e555, 2017 04 21.
Article in English | MEDLINE | ID: mdl-28430172

ABSTRACT

The histone methyltransferase EZH2 has an essential role in the development of follicular lymphoma (FL). Recurrent gain-of-function mutations in EZH2 have been described in 25% of FL patients and induce aberrant methylation of histone H3 lysine 27 (H3K27). We evaluated the role of EZH2 genomic gains in FL biology. Using RNA sequencing, Sanger sequencing and SNP-arrays, the mutation status, copy-number and gene-expression profiles of EZH2 were assessed in a cohort of 159 FL patients from the PRIMA trial. Immunohistochemical (IHC) EZH2 expression (n=55) and H3K27 methylation (n=63) profiles were also evaluated. In total, 37% of patients (59/159) harbored an alteration in the EZH2 gene (mutation n=46, gain n=23). Both types of alterations were associated with highly similar transcriptional changes, with increased proliferation programs. An H3K27me3/me2 IHC score fully distinguished mutated from wild-type samples, showing its applicability as surrogate for EZH2 mutation analysis. However, this score did not predict the presence of gains at the EZH2 locus. The presence of an EZH2 genetic alteration was an independent factor associated with a longer progression-free survival (hazard ratio 0.58, 95% confidence interval 0.36-0.93, P=0.025). We propose that the copy-number status of EZH2 should also be considered when evaluating patient stratification and selecting patients for EZH2 inhibitor-targeted therapies.


Subject(s)
Enhancer of Zeste Homolog 2 Protein/genetics , Histone-Lysine N-Methyltransferase/genetics , Lymphoma, Follicular/genetics , Adult , Aged , Cell Line, Tumor , Disease-Free Survival , Female , Gene Expression Regulation, Neoplastic/genetics , Histone Methyltransferases , Humans , Lymphoma, Follicular/drug therapy , Lymphoma, Follicular/pathology , Male , Methylation/drug effects , Middle Aged , Mutation/genetics , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, RNA
4.
Nucleic Acids Res ; 27(17): 3567-76, 1999 Sep 01.
Article in English | MEDLINE | ID: mdl-10446248

ABSTRACT

We analysed the Bacillus subtilis protein coding sequences termini, and compared it to other genomes. The analysis focused on signals, com-positional biases of nucleotides, oligonucleotides, codons and amino acids and mRNA secondary structure. AUG is the preferred start codon in all genomes, independent of their G+C content, and seems to induce less stable mRNA structures. However, it is not conserved between homologous genes neither is it preferred in highly expressed genes. In B.subtilis the ribosome binding site is very strong. We found that downstream boxes do not seem to exist either in Escherichia coli or in B.subtilis. UAA stop codon usage is correlated with the G+C content and is strongly selected in highly expressed genes. We found less stable mRNA structures at both termini, which we related to mRNA-ribosome and mRNA-release-factor interactions. This pattern seems to impose a peculiar A-rich nucleotide and codon usage bias in these regions. Finally the analysis of all proteins from B.subtilis revealed a similar amino acid bias near both termini of proteins consisting of over-representation of hydrophilic residues. This bias near the stop codon is partially release-factor specific.


Subject(s)
Bacillus subtilis/genetics , Genome, Bacterial , Protein Biosynthesis , Algorithms , Amino Acid Sequence , Amino Acids/analysis , Codon/genetics , Codon, Initiator/genetics , Codon, Terminator/genetics , Escherichia coli/genetics , Molecular Sequence Data , Oligonucleotides/analysis , Peptide Chain Initiation, Translational , RNA, Messenger/analysis
5.
Nucleic Acids Res ; 29(10): 2145-53, 2001 May 15.
Article in English | MEDLINE | ID: mdl-11353084

ABSTRACT

Mycoplasma pulmonis is a wall-less eubacterium belonging to the Mollicutes (trivial name, mycoplasmas) and responsible for murine respiratory diseases. The genome of strain UAB CTIP is composed of a single circular 963 879 bp chromosome with a G + C content of 26.6 mol%, i.e. the lowest reported among bacteria, Ureaplasma urealyticum apart. This genome contains 782 putative coding sequences (CDSs) covering 91.4% of its length and a function could be assigned to 486 CDSs whilst 92 matched the gene sequences of hypothetical proteins, leaving 204 CDSs without significant database match. The genome contains a single set of rRNA genes and only 29 tRNAs genes. The replication origin oriC was localized by sequence analysis and by using the G + C skew method. Sequence polymorphisms within stretches of repeated nucleotides generate phase-variable protein antigens whilst a recombinase gene is likely to catalyse the site-specific DNA inversions in major M.pulmonis surface antigens. Furthermore, a hemolysin, secreted nucleases and a glyco-protease are predicted virulence factors. Surprisingly, several of the genes previously reported to be essential for a self-replicating minimal cell are missing in the M.pulmonis genome although this one is larger than the other mycoplasma genomes fully sequenced until now.


Subject(s)
Genome , Mycoplasma/genetics , Mycoplasma/pathogenicity , Respiratory System/microbiology , Animals , Antigens, Bacterial/genetics , Antigens, Bacterial/immunology , Base Composition , Codon, Terminator/genetics , Computational Biology , Evolution, Molecular , Genetic Code , Genomic Library , Humans , Internet , Lipoproteins/genetics , Mice , Molecular Sequence Data , Mutation/genetics , Mycoplasma/immunology , Open Reading Frames/genetics , Polymorphism, Genetic/genetics , RNA, Bacterial/genetics , Recombination, Genetic/genetics , Repetitive Sequences, Nucleic Acid/genetics , Replication Origin/genetics , Virulence/genetics
6.
J Mol Biol ; 302(4): 797-809, 2000 Sep 29.
Article in English | MEDLINE | ID: mdl-10993724

ABSTRACT

The canonical double-helix form of DNA is thought to predominate both in dilute solution and in living cells. Sequence-dependent fluctuations in local DNA shape occur within the double helix. Besides these relatively modest variations in shape, more extreme and remarkable structures have been detected in which some bases become unpaired. Examples include unusual three-stranded structures such as H-DNA. Certain RNA and DNA strands can also fold onto themselves to form intrastrand triplexes. Although they have been extensively studied in vitro, it remains unknown whether nucleic acid triplexes play natural roles in cells. If natural nucleic acid triplexes were identified in cells, much could be learned by examining the formation, stabilization, and function of such structures. With these goals in mind, we adapted a pattern-recognition program to search genetic databases for a type of potential triplex structure whose presence in genomes has not been previously investigated. We term these sequences Potential Intrastrand Triplex (PIT) elements. The formation of an intrastrand triplex requires three consecutive sequence domains with appropriate symmetry along a single nucleic acid strand. It is remarkable that we discovered multiple copies of sequence elements with the potential to form one particular class of intrastrand triplexes in the fully sequenced genomes of several bacteria. We then focused on the characterization of the 25 copies of a particular approximately 37 nt PIT sequence detected in Escherichia coli. Through biochemical studies, we demonstrate that an isolated DNA strand from this family of E. coli PIT elements forms a stable intrastrand triplex at physiological temperature and pH in the presence of physiological concentrations of Mg(2+).


Subject(s)
Computational Biology/methods , DNA/chemistry , DNA/genetics , Escherichia coli/genetics , Genome, Bacterial , Nucleic Acid Conformation , Algorithms , Base Sequence , Chromosomes, Bacterial/genetics , DNA/classification , DNA/metabolism , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , DNA, Bacterial/metabolism , Databases, Factual , Genes, Bacterial/genetics , Genomics/methods , Hot Temperature , Hydrogen-Ion Concentration , Magnesium/metabolism , Molecular Sequence Data , Nucleic Acid Denaturation , Oligodeoxyribonucleotides/chemistry , Oligodeoxyribonucleotides/genetics , Oligodeoxyribonucleotides/metabolism , Pattern Recognition, Automated , Physical Chromosome Mapping , Regulatory Sequences, Nucleic Acid/genetics , Sequence Alignment , Software , Spectrophotometry, Ultraviolet
7.
Gene ; 165(1): GC37-51, 1995 Nov 07.
Article in English | MEDLINE | ID: mdl-7489895

ABSTRACT

Analysis of the huge volume of data generated by large scale sequencing projects requires the construction of new, sophisticated computer systems. These systems should be able to manage the biological data as well as the results of their analysis. They should also help the user to choose the most appropriate methods, and to string them together in order to solve a global analysis task. In this paper we present the prototype of a software system providing an environment for the analysis of large-scale sequence data. As a first step toward this end, this environment has been put to the test within the Bacillus subtilis genome sequencing project. This system integrates both the descriptive knowledge of the entities involved (genes, regulatory signals and the like) and the methodological knowledge comprising an extensible set of analytical methods. A knowledge representation based on two existing object-oriented models is used to implement this integrated system. In addition, the present prototype provides a suitable user interface both for displaying simultaneously the results generated by several methods and for interacting with the objects. We present in this paper the analysis of a B. subtilis genome fragment, present in data libraries but not annotated. Annotation of the genes present in the fragment allowed us to combine the results of several methods used for predicting coding sequences, and to characterize it as comprising a cryptic phage, the skin element. Comparison between the annotation of the skin element and a standard region of the chromosome indicated that local features of the nucleotide sequence could discriminate between phage and non-phage DNA sequence.


Subject(s)
Bacillus subtilis/genetics , Genome, Bacterial , Sequence Analysis , Base Sequence , Molecular Sequence Data , Software
8.
J Comput Biol ; 5(1): 41-56, 1998.
Article in English | MEDLINE | ID: mdl-9541870

ABSTRACT

In this paper, we present an algorithm to find three-dimensional substructures common to two or more molecules. The basic algorithm is devoted to pairwise structural comparison. Given two sets of atomic coordinates, it finds the largest subsets of atoms which are "similar" in the sense that all internal distances are approximately conserved. The basic idea of the algorithm is to recursively build subsets of increasing sizes, combining two sets of size k to build a set of size k + 1. The algorithm can be used "as is" for small molecules or local parts of proteins (about 30 atoms). When a high number of atoms is involved, we use a two step procedure. First we look for common "local" fragments by using the previous algorithm, and then we gather these fragments by using a Branch and Bound technique. We also extend the basic algorithm to perform multiple comparisons, by using one of the structures as a reference point (pivot) to which all other structures are compared. The solution is the largest subsets of atoms common to the pivot and at least q other structures. Although both algorithms are theoretically exponential in the number of atoms, experiments performed on biological data and using realistic parameters show that the solution is obtained within a few minutes. Finally, an application to the determination of the structural core of seven globins is presented.


Subject(s)
Protein Structure, Tertiary , Algorithms , Amino Acid Sequence , Animals , Computers , Globins/chemistry , Models, Molecular , Molecular Sequence Data , Sequence Alignment , Software
9.
Res Microbiol ; 150(9-10): 725-33, 1999.
Article in English | MEDLINE | ID: mdl-10673010

ABSTRACT

Most recently published complete bacterial genomes have revealed unexpectedly high numbers of long strict repeats. In this article we discuss the various functional and evolutionary roles of these repeats, focusing in particular on their role in terms of genome stability, gene transfer, and antigenic variation.


Subject(s)
Bacteria/genetics , Genome, Bacterial , Minisatellite Repeats/physiology , Antigenic Variation , Bacillus subtilis/genetics , Bacteria/immunology , Bacteria/pathogenicity , DNA, Bacterial/genetics , Evolution, Molecular , Repetitive Sequences, Nucleic Acid , Transformation, Bacterial
10.
J Biotechnol ; 78(3): 209-19, 2000 Mar 31.
Article in English | MEDLINE | ID: mdl-10751682

ABSTRACT

As bacterial genome sequences accumulate, more and more pieces of data suggest that there is a significant correlation between the distribution of genes along the chromosome and the physical architecture of the cell, suggesting that the map of the cell is in the chromosome. Considering sequences and experimental data indicative of cell compartmentalisation, mRNA folding and turnover, as well as known structural features of protein and membrane complexes, we show that preliminary in silico analysis of whole genome sequences strongly substantiates this hypothesis. If there is a correlation between the genome sequence and the cell architecture, it must derive from some selection pressure in the organisms growing in the wild. As a consequence, the underlying constraints should be optimised in genetically modified organisms if one is to expect high product yields. Consequences in terms of gene expression for biotechnology are straightforward: knocking genes out and in genomes should not be randomly performed, but should follow the rules of chromosome organisation.


Subject(s)
Bacteria/genetics , Chromosomes, Bacterial/genetics , Genes, Bacterial , Biotechnology , Cell Compartmentation , Codon/genetics , Gene Expression , Genome, Bacterial , Models, Genetic , Operon
11.
Phytochemistry ; 31(9): 3177-81, 1992 Sep.
Article in English | MEDLINE | ID: mdl-1368414

ABSTRACT

Six saponins have been isolated and identified from the leaves of Steganotaenia araliacea. They were identified as 3-O-[beta-D-galactopyranosyl(1----2)-(beta-D-galactopyranosyl (1----3))-beta-D-glucuronopyranosyl]-21-O-tigloyl and -21-O-angeloyl-R1-barrigenol, 3-O-[beta-D-glucopyranosyl(1----2)-(beta-D-xylopyranosyl (1----3))-beta-D-glucuronopyranosyl]-21-O-tigloyl and -21-O-angeloyl-R1-barrigenol, 3-O-[beta-D-glucopyranosyl(1----2)-(beta-D-glucopyranosyl-(1----3))-(alp ha-L- rhamnopyranosyl(1----4))-beta-D-glucopyranosyl] steganogenin and 3-O-[(beta-D-galactopyranosyl(1----2)-beta-D-glucuronopyranosyl]-2 8-O- beta-D-glucopyranosyl olean-12-ene-28-oic acid. Steganogenin is a new 17,22-seco-oleanolic acid derivative. The structures of the saponins were established by analysis of their 1H and 13C NMR spectra with the help of 2D-experiments and by Californium Plasma Desorption Mass Spectrometry.


Subject(s)
Plants/chemistry , Saponins/isolation & purification , Carbohydrate Sequence , Magnetic Resonance Spectroscopy , Molecular Sequence Data , Molecular Structure , Saponins/chemistry
12.
Phytochemistry ; 31(10): 3571-6, 1992 Oct.
Article in English | MEDLINE | ID: mdl-1368864

ABSTRACT

Two bioactive saponins were isolated from the stem bark of Petersianthus macrocarpus. Their structures were elucidated by chemical degradations and by a combination of 2D NMR techniques and by Californium plasma desorption mass spectrometry. They are 3-O-([beta-D-galactopyranosyl (1-->2)][beta-D-galactopyranosyl (1-->3)]- beta-D-glucuronopyranosyl)-21-O-[3-(3-tigloyloxynilic acid)-4-tigloyloxy- alpha-L-arabinopyranosyl] barringtogenol C and 3-O-([beta-D-galactopyranosyl (1-->2)][beta-D-galactopyranosyl (1-->3)]-beta-D-glucuronopyranosyl)-28-O-alpha-L-rhamnopyranosyl barringtogenol C-21-O-benzoate. The absolute configuration of nilic acid was determined by partial synthesis. 3,3'-Dimethoxy ellagic acid and 3,3'-dimethoxy-4-O-beta-D- glucopyranosyl ellagic acid were also isolated.


Subject(s)
Plants, Medicinal/chemistry , Saponins/isolation & purification , Carbohydrate Sequence , Magnetic Resonance Spectroscopy , Molecular Sequence Data , Molecular Structure , Saponins/chemistry
13.
Int J Radiat Biol ; 57(5): 903-18, 1990 May.
Article in English | MEDLINE | ID: mdl-1970993

ABSTRACT

Near-ultraviolet photolysis of 2'-deoxycytidine (dCyd) and 3-carbethoxypsoralen (3-CPs) in the dry state was found to generate two main stable photoadducts which were separated by thin-layer and high-performance liquid chromatography. Fast atom bombardment and plasma desorption mass spectrometry analyses suggested that the bound molecule to 3-CPs is dCyd. These two compounds were found to produce the corresponding 2'-deoxyuridine (dUrd) derivatives through a deamination process when left in aqueous solutions with a lifetime close to 24 h at 20 degrees C. The chemical structure of the deaminated photoadducts was confirmed by photochemical synthesis using dUrd as the substrate. UV and fluorescent measurements indicated that the furan moiety of 3-CPs is involved in the photobinding reaction. The cyclobutane type structure of the modified dUrd derivatives was established on the basis of its photoreversibility and detailed 1H NMR analysis. The cis-syn stereoconfiguration of the two photocycloadducts was inferred from coupling constant considerations and on the basis of the complete assignment of the cyclobutyl protons, requiring the synthesis of deuterated nucleosides at pyrimidine carbon C(6). Further confirmation of the diastereoisomeric relationship between the two cis-syn dUrd <54' 65'> 3-CPs was provided by circular dichroism measurements.


Subject(s)
Deoxycytidine/radiation effects , Deoxyuridine/radiation effects , Furocoumarins/radiation effects , Ultraviolet Rays , Photochemistry , Photochemotherapy
14.
J Photochem Photobiol B ; 2(3): 321-39, 1988 Nov.
Article in English | MEDLINE | ID: mdl-3148697

ABSTRACT

The near-UV-induced photoreaction of the bifunctional 8-methoxypsoralen (8-MOP) with 2'-deoxyadenosine (dAdo) was investigated in the dry state. Four main monoadducts of 8-MOP to 2'-deoxyadenosine were separated by high performance liquid chromatography and subsequently characterized by soft ionization mass spectrometry (fast atom bombardment and plasma desorption mass spectrometries) and extensive 1H NMR analysis including nuclear Overhauser effect (NOE) measurements. These new types of furocoumarin-nucleic acid component which appear to be specific to 2'-deoxyadenosine were shown to result from recombination of the 3,4-dihydropyron-4-yl radical of 8-MOP with 2'-deoxyadenosyl radical either at the 1' or the 5' position.


Subject(s)
DNA , Deoxyadenosines , Methoxsalen , Photochemistry , Chromatography, High Pressure Liquid , Magnetic Resonance Spectroscopy , Mass Spectrometry
16.
Comput Appl Biosci ; 6(2): 71-80, 1990 Apr.
Article in English | MEDLINE | ID: mdl-2361187

ABSTRACT

In this paper, we present methods to detect and localize patterns in biologically related protein sequences (family). The patterns common to the sequences of the family are detected by using Fourier analysis. No previous scales (codes) are needed, they are actually produced as a result of the analysis procedure, together with the frequencies of the Fourier decompositions. Characteristic features of the family are thus expressed as (code-frequency) pairs. Various tools are proposed in order to localize the patterns, to compare the codes, and to evaluate the proximity of an arbitrary sequence to the investigated family. The general strategy is illustrated on a family composed of calcium-binding proteins.


Subject(s)
Amino Acid Sequence , Signal Processing, Computer-Assisted , Fourier Analysis , Pattern Recognition, Automated , Proteins
17.
Comput Appl Biosci ; 7(1): 31-8, 1991 Jan.
Article in English | MEDLINE | ID: mdl-2004272

ABSTRACT

In previous work, we have shown that a set of characteristics, defined as (code frequency) pairs, can be derived from a protein family by the use of a signal-processing method. This method enables the location and extraction of sequence patterns by taking into account each (code frequency) pair individually. In the present paper, we propose to extend this method in order to detect and visualize patterns by taking into account several pairs simultaneously. Two 'multifrequency' methods are described. The first one is based on a rewriting of the sequences with new symbols which summarize the frequency information. The second method is based on a clustering of the patterns associated with each pair. Both methods lead to the definition of significant consensus sequences. Some results obtained with calcium-binding proteins and serine proteases are also discussed.


Subject(s)
Calcium-Binding Proteins/genetics , Serine Endopeptidases/genetics , Amino Acid Sequence , Cluster Analysis , Humans , Molecular Sequence Data , Software
18.
Nucleic Acids Res ; 24(8): 1395-403, 1996 Apr 15.
Article in English | MEDLINE | ID: mdl-8628670

ABSTRACT

At the DNA/RNA level, biological signals are defined by a combination of spatial structures and sequence motifs. Until now, few attempts had been made in writing general purpose search programs that take into account both sequence and structure criteria. Indeed, the most successful structure scanning programs are usually dedicated to particular structures and are written using general purpose programming languages through a complex and time consuming process where the biological problem of defining the structure and the computer engineering problem of looking for it are intimately intertwined. In this paper, we describe a general representation of structures, suitable for database scanning, together with a programming language, Palingol, designed to manipulate it. Palingol has specific data types, corresponding to structural elements-basically helices-that can be arranged in any way to form a complex structure. As a consequence of the declarative approach used in Palingol, the user should only focus on 'what to search for' while the language engine takes care of 'how to look for it'. Therefore, it becomes simpler to write a scanning program and the structural constraints that define the required structure are more clearly identified.


Subject(s)
Nucleic Acid Conformation , Programming Languages , Base Sequence , DNA , Databases, Factual , Iron , Molecular Sequence Data , RNA, Bacterial/chemistry , RNA, Transfer/chemistry , Regulatory Sequences, Nucleic Acid
19.
Mol Microbiol ; 5(11): 2629-40, 1991 Nov.
Article in English | MEDLINE | ID: mdl-1779754

ABSTRACT

The DNA sequence data for Escherichia coli deposited in the EMBL library (release 27), together with miscellaneous data obtained from several laboratories, have been localized on an updated and corrected version of the restriction map of the chromosome generated by Kohara et al. (1987) and modified by others. This second update adds a further 500 kbp, increasing the amount of the E. coli chromosome sequenced to about one third of the total: 1510 kbp of sequenced DNA is included in the present data base. The accuracy of the map is assessed, and allows us to propose a precise genetic map position for every sequenced gene. The location of rare-cutting sites such as AvrII, NotI and SfiI have also been included in the update in order to combine the data obtained from different sources into one single file. The distribution of palindromic sequences (to which most restriction sites belong) has been studied in coding sequences. There appears to be a significant counter-selection against several such sequences in E. coli coding sequences (but not in other organisms such as Saccharomyces cerevisiae), suggesting the existence of constraints on DNA structure in E. coli, perhaps indicative of a functional role for horizontal gene transfer, preserving coding sequences, in this type of bacteria.


Subject(s)
Escherichia coli/genetics , Genes, Bacterial , Genome , Base Sequence , Chromosome Mapping , Chromosomes, Bacterial , DNA, Bacterial/genetics , Databases, Factual , Restriction Mapping
20.
Microbiol Rev ; 57(3): 623-54, 1993 Sep.
Article in English | MEDLINE | ID: mdl-8246843

ABSTRACT

Several data libraries have been created to organize all the data obtained worldwide about the Escherichia coli genome. Because the known data now amount to more than 40% of the whole genome sequence, it has become necessary to organize the data in such a way that appropriate procedures can associate knowledge produced by experiments about each gene to its position on the chromosome and its relation to other relevant genes, for example. In addition, global properties of genes, affected by the introduction of new entries, should be present as appropriate description fields. A data base, implemented on Macintosh by using the data base management system 4th Dimension, is described. It is constructed around a core constituted by known contigs of E. coli sequences and links data collected in general libraries (unmodified) to data associated with evolving knowledge (with modifiable fields). Biologically significant results obtained through the coupling of appropriate procedures (learning or statistical data analysis) are presented. The data base is available through a 4th Dimension runtime and through FTP on Internet. It has been regularly updated and will be systematically linked to other E. coli data bases (M. Kroger, R. Wahl, G. Schachtel, and P. Rice, Nucleic Acids Res. 20(Suppl.):2119-2144, 1992; K. E. Rudd, W. Miller, C. Werner, J. Ostell, C. Tolstoshev, and S. G. Satterfield, Nucleic Acids Res. 19:637-647, 1991) in the near future.


Subject(s)
Databases, Factual , Escherichia coli/genetics , Genome, Bacterial , Bacterial Proteins/genetics , Base Sequence , Chromosome Mapping , Chromosomes, Bacterial , DNA Replication , Data Display , Database Management Systems , Genes, Bacterial , Models, Theoretical , Molecular Sequence Data , Transcription, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL