Pesquisa | Biblioteca Virtual em Saúde

Statistico-syntactic learning techniques.

Soldano, H; Moisy, J L.

Biochimie ; 67(5): 493-8, 1985 May.

Artigo em Inglês | MEDLINE | ID: mdl-3839691

RESUMO

The methods of "learning from examples" enable the solving of problems of classification: discrimination between two classes of objects, assimilation of an object to a class of objects representing a property. They are used in a situation where we don't know a priori a procedure in order to decide, but we have examples (in sufficient amount). After a learning stage with the examples, a procedure to solve the problem is built. In the exposed methodology the description of an object is a list of attributes, the acquired knowledge is sets of "rules" considered as arguments in favour of a particular decision.

Assuntos

Computadores , Aprendizagem , Software , Aprendizagem por Discriminação , Lógica , Matemática , Estatística como Assunto

Pairwise and multiple identification of three-dimensional common substructures in proteins.

Escalier, V; Pothier, J; Soldano, H; Viari, A.

J Comput Biol ; 5(1): 41-56, 1998.

Artigo em Inglês | MEDLINE | ID: mdl-9541870

RESUMO

In this paper, we present an algorithm to find three-dimensional substructures common to two or more molecules. The basic algorithm is devoted to pairwise structural comparison. Given two sets of atomic coordinates, it finds the largest subsets of atoms which are "similar" in the sense that all internal distances are approximately conserved. The basic idea of the algorithm is to recursively build subsets of increasing sizes, combining two sets of size k to build a set of size k + 1. The algorithm can be used "as is" for small molecules or local parts of proteins (about 30 atoms). When a high number of atoms is involved, we use a two step procedure. First we look for common "local" fragments by using the previous algorithm, and then we gather these fragments by using a Branch and Bound technique. We also extend the basic algorithm to perform multiple comparisons, by using one of the structures as a reference point (pivot) to which all other structures are compared. The solution is the largest subsets of atoms common to the pivot and at least q other structures. Although both algorithms are theoretically exponential in the number of atoms, experiments performed on biological data and using realistic parameters show that the solution is obtained within a few minutes. Finally, an application to the determination of the structural core of seven globins is presented.

Assuntos

Estrutura Terciária de Proteína , Algoritmos , Sequência de Aminoácidos , Animais , Computadores , Globinas/química , Modelos Moleculares , Dados de Sequência Molecular , Alinhamento de Sequência , Software

From data banks to data bases.

Danchin, A; Médigue, C; Gascuel, O; Soldano, H; Hénaut, A.

Res Microbiol ; 142(7-8): 913-6, 1991.

Artigo em Inglês | MEDLINE | ID: mdl-1784830

RESUMO

The information collected in national and international libraries on nucleotide and protein sequences cannot be directly treated for proper handling by existing software. Therefore we evaluated the feasibility of constructing a data base for Escherichia coli using the data present in the banks. The knowhow thus acquired was applied to Bacillus subtilis. Specific examples of the general procedure are given.

Assuntos

Bacillus subtilis/ultraestrutura , Cromossomos Bacterianos/ultraestrutura , DNA Bacteriano/ultraestrutura , Bases de Dados Factuais , Escherichia coli/ultraestrutura , Bacillus subtilis/genética , Sequência de Bases/genética , DNA Bacteriano/genética , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Bibliográficas , Escherichia coli/genética , Técnicas In Vitro , Dados de Sequência Molecular

A new method to predict the consensus secondary structure of a set of unaligned RNA sequences.

Bouthinon, D; Soldano, H.

Bioinformatics ; 15(10): 785-98, 1999 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-10705432

RESUMO

MOTIVATION: To predict the consensus secondary structure, possibly including pseudoknots, of a set of RNA unaligned sequences. RESULTS: We have designed a method based on a new representation of any RNA secondary structure as a set of structural relationships between the helices of the structure. We refer to this representation as a structural pattern. In a first step, we use thermodynamic parameters to select, for each sequence, the best secondary structures according to energy minimization and we represent each of them using its corresponding structural pattern. In a second step, we search for the repeated structural patterns, i.e. the largest structural patterns that occur in at least one sequence, i.e. included in at least one of the structural patterns associated to each sequence. Thanks to an efficient encoding of structural patterns, this search comes down to identifying the largest repeated word suffixes in a dictionary. In a third step, we compute the plausibility of each repeated structural pattern by checking if it occurs more frequently in the studied sequences than in random RNA sequences. We then suppose that the consensus secondary structure corresponds to the repeated structural pattern that displays the highest plausibility. We present several experiments concerning tRNA, fragments of 16S rRNA and 10Sa RNA (including pseudoknots); in each of them, we found the putative consensus secondary structure.

Assuntos

Biologia Computacional , Conformação de Ácido Nucleico , RNA/química , RNA/genética , Algoritmos , Sequência de Bases , Sequência Consenso , Escherichia coli/química , Escherichia coli/genética , Dados de Sequência Molecular , RNA Bacteriano/química , RNA Bacteriano/genética , RNA Ribossômico 16S/química , RNA Ribossômico 16S/genética , RNA de Transferência/química , RNA de Transferência/genética , Sequências Repetitivas de Ácido Nucleico , Termodinâmica

A scale-independent signal processing method for sequence analysis.

Viari, A; Soldano, H; Ollivier, E.

Comput Appl Biosci ; 6(2): 71-80, 1990 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-2361187

RESUMO

In this paper, we present methods to detect and localize patterns in biologically related protein sequences (family). The patterns common to the sequences of the family are detected by using Fourier analysis. No previous scales (codes) are needed, they are actually produced as a result of the analysis procedure, together with the frequencies of the Fourier decompositions. Characteristic features of the family are thus expressed as (code-frequency) pairs. Various tools are proposed in order to localize the patterns, to compare the codes, and to evaluate the proximity of an arbitrary sequence to the investigated family. The general strategy is illustrated on a family composed of calcium-binding proteins.

Assuntos

Sequência de Aminoácidos , Processamento de Sinais Assistido por Computador , Análise de Fourier , Reconhecimento Automatizado de Padrão , Proteínas

'Multifrequency' location and clustering of sequence patterns from proteins.

Ollivier, E; Soldano, H; Viari, A.

Comput Appl Biosci ; 7(1): 31-8, 1991 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-2004272

RESUMO

In previous work, we have shown that a set of characteristics, defined as (code frequency) pairs, can be derived from a protein family by the use of a signal-processing method. This method enables the location and extraction of sequence patterns by taking into account each (code frequency) pair individually. In the present paper, we propose to extend this method in order to detect and visualize patterns by taking into account several pairs simultaneously. Two 'multifrequency' methods are described. The first one is based on a rewriting of the sequences with new symbols which summarize the frequency information. The second method is based on a clustering of the patterns associated with each pair. Both methods lead to the definition of significant consensus sequences. Some results obtained with calcium-binding proteins and serine proteases are also discussed.

Assuntos

Proteínas de Ligação ao Cálcio/genética , Serina Endopeptidases/genética , Sequência de Aminoácidos , Análise por Conglomerados , Humanos , Dados de Sequência Molecular , Software

Sequence analysis of cell cycle control (cdc2) protein kinases among protein serine/threonine kinases.

Guerrucci, M A; Soldano, H; Bellé, R.

Biol Cell ; 70(1-2): 1-8, 1990.

Artigo em Inglês | MEDLINE | ID: mdl-2150765

RESUMO

Among protein serine/threonine kinases, the CDC2 proteins are both well characterized as protein serine/threonine kinases and are functionally involved in the control of cell division. Protein serine/threonine kinase sequences were analysed using Fourier transform of the coded sequences. Characteristic code/frequency pairs were extracted from a set of well defined protein serine/threonine kinases. The characteristic frequencies 0.179, 0.250 and 0.408 distinguished protein serine/threonine kinases from proteins which did not have the biological activity. Pertinent patterns in the sequence, responsible for the code/frequency pairs detection were searched and found to be correlated with the putative catalytic domain of the proteins. Protein serine/threonine kinases involved in cell division control, CDC2 protein kinases, were compared to the other protein serine/threonine kinases. Specific code/frequency pairs were extracted from the sequences and could be related to the function or regulation of the kinases in cell division. Two CDC2 related proteins CDC2(Mm) from mice and CDC2(Gg) from chicken were shown to fit well with the CDC2 proteins, whereas KIN28, PHO85 and PSKJ3, which share sequence homology but not functional activity with the CDC2 proteins, were clearly excluded from the CDC2 proteins by the characteristic code/frequency pairs. Pertinent patterns in the CDC2 proteins were analysed and mapped on the CDC2 related protein sequences. Four patterns were correlated with the code/frequency detection and therefore, could be associated to the regulation of the CDC2-related proteins.

Assuntos

Proteína Quinase CDC2/genética , Sequência de Aminoácidos , Animais , Ciclo Celular/genética , Análise de Fourier , Dados de Sequência Molecular , Proteínas Quinases/genética , Proteínas Serina-Treonina Quinases , Homologia de Sequência do Ácido Nucleico

A distance-based block searching algorithm.

Sagot, M F; Viari, A; Soldano, H.

Proc Int Conf Intell Syst Mol Biol ; 3: 322-31, 1995.

Artigo em Inglês | MEDLINE | ID: mdl-7584455

RESUMO

We present in this paper an algorithm for the multiple comparison of a set of protein sequences. Our approach is that of peptide matching and consists in looking for all the words that occur approximatively in at least q of the sequences in the set, where q is a parameter. Words are compared by using a reference object called a model, that is itself a word over the alphabet of the amino acids, and the comparison between a model and a word is based on w-length words instead of single symbols. This idea is similar to the one used in the Blast program in the case of pairwise comparisons. Two w-length words are considered to be related if an alignment without gaps of the two using a similarity matrix has a score greater than a certain threshold value t. In our case, we say that a k-length word u is an occurrence of a model m of the same length if every w-length subword of u is related to the corresponding subword of m in the sense given above. If a model m has occurrences in at least q of the sequences of the set, m is said to occur in the set. In percentage terms, the value of q may correspond to something as small as 5% of the sequences (search for recurrent words in a set of non homologous proteins) or as high as 70-100% (establishment of a list of all similar words as a first step in a multiple alignment program). The algorithm presented here is an efficient and exact way of looking for all the models, of a fixed length k or of the greatest possible length kmax, that occur in a set of sequences. It can work with any kind of scoring matrix and an extension of the algorithm allows for the introduction of gaps between a model and its occurrences.

Assuntos

Algoritmos , Proteínas/química , Homologia de Sequência de Aminoácidos , Sequência de Aminoácidos , Animais , Simulação por Computador , Humanos , Modelos Teóricos , Dados de Sequência Molecular , Software

Finding flexible patterns in a text: an application to three-dimensional molecular matching.

Sagot, M F; Viari, A; Pothier, J; Soldano, H.

Comput Appl Biosci ; 11(1): 59-70, 1995 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-7796276

RESUMO

Finding certain regularities in a text is an important problem in many areas, e.g. in the analysis of biological molecules such as nucleic acids or proteins. In the latter case, the text may be sequences of amino acids or a linear coding of three-dimensional structures, and the regularities then correspond to lexical or structural motifs common to two, or more, proteins. We first recall an earlier algorithm that found these regularities in a flexible way. Then we introduce a generalized version of this algorithm designed for the particular case of protein three-dimensional structures, since these structures present a few peculiarities that make them computationally harder to process. Finally, we give some applications of our new algorithm on concrete examples.

Assuntos

Algoritmos , Reconhecimento Automatizado de Padrão , Proteínas/química , Sistema Enzimático do Citocromo P-450/química , Bases de Dados Factuais , Modelos Moleculares , Modelos Estatísticos , Estrutura Molecular , Conformação Proteica , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA