Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Bioinformatics ; 23(16): 2063-72, 2007 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-17540679

RESUMO

MOTIVATION: A major challenge in current biomedical research is the identification of cellular processes deregulated in a given pathology through the analysis of gene expression profiles. To this end, predefined lists of genes, coding specific functions, are compared with a list of genes ordered according to their values of differential expression measured by suitable univariate statistics. RESULTS: We propose a statistically well-founded method for measuring the relevance of predefined lists of genes and for assessing their statistical significance starting from their raw expression levels as recorded on the microarray. We use prediction accuracy as a measure of relevance of the list. The rationale is that a functional category, coded through a list of genes, is perturbed in a given pathology if it is possible to correctly predict the occurrence of the disease in new subjects on the basis of the expression levels of the genes belonging to the list only. The accuracy is estimated with multiple random validation strategy and its statistical significance is assessed against a couple of null hypothesis, by using two independent permutation tests. The utility of the proposed methodology is illustrated by analyzing the relevance of Gene Ontology terms belonging to biological process category in colon and prostate cancer, by using three different microarray data sets and by comparing it with current approaches. AVAILABILITY: Source code for the algorithms is available from author upon request. SUPPLEMENTARY INFORMATION: Colon cancer data set and a complete description of experimental results are available at: ftp://bioftp:76bioftpxxx@marx.ba.issia.cnr.it/supp-info.htm.


Assuntos
Biomarcadores Tumorais/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Família Multigênica , Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Interpretação Estatística de Dados , Humanos , Masculino , Proteínas de Neoplasias/classificação
2.
BMC Bioinformatics ; 7: 387, 2006 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-16919171

RESUMO

BACKGROUND: In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia--Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. RESULTS: We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed. CONCLUSIONS: The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required.


Assuntos
Algoritmos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Estatística como Assunto/métodos , Idoso , Neoplasias do Colo/classificação , Neoplasias do Colo/genética , Interpretação Estatística de Dados , Feminino , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Análise Numérica Assistida por Computador , Reprodutibilidade dos Testes , Software
3.
Nucleic Acids Res ; 29(1): 167-8, 2001 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11125079

RESUMO

The PLMItRNA database for mitochondrial tRNA molecules and genes in VIRIDIPLANTAE: (green plants) [Volpetti,V., Gallerani,R., DeBenedetto,C., Liuni,S., Licciulli,F. and Ceci,L.R. (2000) Nucleic Acids Res., 28, 159-162] has been enlarged to include algae. The database now contains 436 genes and 16 tRNA entries relative to 25 higher plants, eight green algae, four red algae (RHODOPHYTAE:) and two STRAMENOPILES: The PLMItRNA database is accessible via the WWW at http://bio-www.ba.cnr.it:8000/PLMItRNA.


Assuntos
DNA Mitocondrial/genética , Bases de Dados Factuais , Células Eucarióticas/metabolismo , RNA de Transferência/genética , Eucariotos/genética , Serviços de Informação , Internet , Fotossíntese , Plantas/genética
4.
Nucleic Acids Res ; 30(1): 347-8, 2002 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11752333

RESUMO

PLANT-PIs is a database developed to facilitate retrieval of information on plant protease inhibitors (PIs) and related genes. For each PI, links to sequence databases are reported together with a summary of the functional properties of the molecule (and its mutants) as deduced from literature. PLANT-PIs contains information for 351 plant PIs, plus several isoinhibitors. The database is accessible at http://bighost.area.ba.cnr.it/PLANT-PIs.


Assuntos
Bases de Dados de Proteínas , Genes de Plantas , Plantas/enzimologia , Inibidores de Proteases/química , Sequência de Aminoácidos , Sítios de Ligação , Análise Mutacional de DNA , DNA de Plantas/análise , Expressão Gênica , Armazenamento e Recuperação da Informação , Internet , Proteínas de Plantas/química , Proteínas de Plantas/genética , Plantas/genética , Relação Estrutura-Atividade
5.
Gene ; 205(1-2): 95-102, 1997 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-9461382

RESUMO

The important role of 5' and 3' untranslated regions of eukaryotic mRNAs in gene regulation and expression is now widely accepted. In order to study the general structural and compositional features of these sequences we developed UTRdb, a specialized database of 5' and 3'-UTR sequences from seven different taxonomic groups of eukaryotic mRNAs cleaned of redundancy. The analysis of the UTR sequences contained in this database showed that 5'-UTR sequences, on average 200 nucleotides long, are 1.5-3 times shorter than the corresponding 3'-UTR sequences in the various taxonomic groups considered here. As to their compositional properties on average 5'-UTR sequences resulted in all cases GC richer than 3'-UTR sequences, and significant correlations were found between the GC content of 5' and 3'-UTR sequences and the GC content of the third silent codon positions of the corresponding protein coding genes. The dinucleotide analysis showed a differential depletion of CpG in vertebrate 5' and 3'-UTR, with 5'-UTR sequences being more CpG-rich, and a generalized depletion of TpA in both 5' and 3'-UTR was observed in all eukaryotic sequence collections.


Assuntos
RNA Mensageiro/genética , Composição de Bases , Íntrons , Biossíntese de Proteínas
6.
Gene ; 261(1): 85-91, 2000 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-11164040

RESUMO

The AUG start codon context features have been investigated by analyzing eukaryotic mRNAs belonging to various taxonomic groups. The functional relevance of each specific position surrounding the AUG start codon has been established as a function of the measured shift between base composition observed at that particular position, and base composition averaged over all the 5'untranslated regions. A more detailed analysis carried out on human genes belonging to different isochores showed significant isochore-specific fea-tures that cannot be explained only by a mutational bias effect. The most represented heptamers spanning from position -3 to +4 with respect to the initiator AUG have been determined for mRNAs belonging to different taxonomic groups and a web page utility has been set up (http://bigarea.area.ba.cnr.it:8000/BioWWW/ATG.html) to determine the relative abundance of a user submitted oligonucleotide context in a given species or taxon.


Assuntos
Códon de Iniciação/genética , Células Eucarióticas/metabolismo , RNA Mensageiro/genética , Regiões 5' não Traduzidas/genética , Regiões 5' não Traduzidas/metabolismo , Animais , Composição de Bases , Sequência de Bases , Sítios de Ligação , DNA/genética , Bases de Dados Factuais , Genes/genética , Genoma Humano , Humanos , Fases de Leitura Aberta/genética , Ribossomos/metabolismo
7.
Gene ; 276(1-2): 73-81, 2001 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-11591473

RESUMO

The crucial role of the non-coding portion of genomes is now widely acknowledged. In particular, mRNA untranslated regions are involved in many post-transcriptional regulatory pathways that control mRNA localization, stability and translation efficiency. We review in this paper the major structural and compositional features of eukaryotic mRNA untranslated regions and provide some examples of bioinformatic analyses for their functional characterization.


Assuntos
Regiões 3' não Traduzidas/genética , Regiões 5' não Traduzidas/genética , Células Eucarióticas/metabolismo , RNA Mensageiro/genética , Animais , Composição de Bases , Sequência de Bases , Sequência Conservada , Bases de Dados Factuais , Humanos , Íntrons , Sequências Reguladoras de Ácido Nucleico , Sequências Repetitivas de Ácido Nucleico
8.
Biotechniques ; 25(1): 112-7, 120-3, 1998 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-9668985

RESUMO

A computer program is presented that selects a small set of short primer pairs for PCR to sample all the sequences in a user-specified list of mRNAs. Such primer pairs could be used to increase the probability of sampling mRNAs of particular interest in differential display and to generate simplified hybridization probes for DNA chips or arrays. The program uses simulated PCR to find pairs of primers that sample more than one sequence in the list. A small set of such primer pairs is selected that give maximal coverage of the sequences in the list. Primer pairs are excluded that: (i) generate simulated PCR products of the same size from a number of sequences in the list, (ii) can easily form primer dimers, (iii) are outside a specified range of G + C content or (iv) occur in another list of undesirable sequences, such as rRNAs and Alu repeats. Five lists consisting of from 48-285 cDNA sequences were used to test the program. A small number of pairs of primers, 8-10 bases in length, were selected that fit the above criteria and that generate one or more simulated PCR products in all or most of the cDNAs in each list.


Assuntos
Primers do DNA/genética , Software , Sequência de Bases , Biologia Computacional , Primers do DNA/química , DNA Complementar/química , DNA Complementar/genética , Reação em Cadeia da Polimerase , Linguagens de Programação
10.
Nucleic Acids Res ; 16(5): 1715-28, 1988 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-3281142

RESUMO

This study describes a method for the backtranslation of an aminoacidic sequence, an extremely useful tool for various experimental approaches. It involves two computer programs CLUSTER and BACKTR written in Fortran 77 running on a VAX/VMS computer. CLUSTER generates a reliable codon usage table through a cluster analysis, based on a chi 2-like distance between the sequences. BACKTR produces backtranslated sequences according to different options when use is made of the codon usage table obtained in addition to selecting the least ambiguous potential oligonucleotide probes within an aminoacidic sequence. The method was tested by applying it to 158 yeast genes.


Assuntos
Sequência de Aminoácidos , Sequência de Bases , Códon/genética , Biossíntese de Proteínas , RNA Mensageiro/genética , Software/métodos , Genes Fúngicos , Sistemas de Informação , Matemática , Saccharomyces cerevisiae/genética
11.
Comput Chem ; 20(1): 141-4, 1996 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-8867845

RESUMO

The important role that untranslated regions of mRNAs (UTR) may play in gene regulation and expression is now widely accepted. For this reason we developed UTRDB, a specialized database of 5'- and 3'-UTR of eukaryotic mRNAs cleaned of redundancy. This paper describes the composition and the general feature of UTRDB. The analysis of UTRDB by using suitable statistical methods could provide useful information for guiding the experimental work aimed at delucidating the role of UTR sequences in gene regulation and expression.


Assuntos
Biossíntese de Proteínas/genética , RNA Mensageiro/química , Algoritmos , Animais , Regulação da Expressão Gênica/genética , Projeto Genoma Humano , Sistemas de Informação , RNA Mensageiro/genética , Análise de Sequência , Software
12.
Protein Seq Data Anal ; 3(4): 327-34, 1990 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-2235975

RESUMO

EMBL and GenBank keyword indexes have no hierarchical structure. In this paper we present a method for merging and reorganizing them in a tree structure whose primary roots are the keywords 'protein', 'DNA', 'RNA', and 'unclassified'. Synonymous keywords have been grouped together and erroneous keywords have been corrected. This taxonomic organization of keywords results in a more extensive and efficient retrieval which is further aided by "synonyms declaration". The tree has been produced using the computer programs GENPOINT and CREANET.


Assuntos
Sequência de Bases , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Indexação e Redação de Resumos
13.
Nucleic Acids Res ; 18(13): 3745-52, 1990 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-2197595

RESUMO

A method is proposed for the automatic detection of serial periodicities in a linear sequence. Its application to DNA subtelomeric sequences from two lower eukaryotes, P.falciparum and S.cerevisiae, reveals ordered patterns organised in hierarchical periodicities, not easily recognizable by other methods. The possible implications concerning the evolution of tandemly repetitive arrays are discussed in light of a model which involves, as successive steps, random repeat modification, the fusion of differently modified repeat versions into longer units, and the amplification of (and/or homogenization to) the more recent repeat units.


Assuntos
DNA Fúngico , DNA , Sequências Repetitivas de Ácido Nucleico , Algoritmos , Animais , Sequência de Bases , Métodos , Dados de Sequência Molecular , Plasmodium falciparum/genética , Saccharomyces cerevisiae/genética
14.
Nucleic Acids Res ; 28(1): 153-4, 2000 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-10592208

RESUMO

The AMmtDB database (http://bio-www.ba.cnr.it:8000/srs6/ ) has been updated by collecting the multi-aligned sequences of Chordata mitochondrial genes coding for proteins and tRNAs. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. AMmtDB data selected through SRS can be viewed and managed using GeneDoc or other programs for the management of multi-aligned data depending on the user's operative system. The multiple alignments have been produced with CLUSTALW and PILEUP programs and then carefully optimized manually.


Assuntos
Cordados não Vertebrados/genética , DNA Mitocondrial/genética , Bases de Dados Factuais , Animais , Internet , Alinhamento de Sequência
15.
Bioinformatics ; 16(5): 439-50, 2000 May.
Artigo em Inglês | MEDLINE | ID: mdl-10871266

RESUMO

MOTIVATION: The identification of sequence patterns involved in gene regulation and expression is a major challenge in molecular biology. In this paper we describe a novel algorithm and the software for searching nucleotide and protein sequences for complex nucleotide patterns including potential secondary structure elements, also allowing for mismatches/mispairings below a user-fixed threshold, and assessing the statistical significance of their occurrence through a Markov chain simulation. RESULTS: The application of the proposed algorithm allowed the identification of some functional elements, such as the Iron Responsive Element, the Histone stem-loop structure and the Selenocysteine Insertion Sequence, located in the mRNA untranslated regions of post-transcriptionally regulated genes with the assessment of sensitivity and selectivity of the searching method. AVAILABILITY: A Web interface is available at: http://bigarea.area.ba.cnr.it:8000/EmbIT/Pats earch.html.


Assuntos
Alinhamento de Sequência/métodos , Software , Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , Algoritmos , Sequência de Aminoácidos , Animais , Sequência de Bases , Simulação por Computador , DNA/química , DNA/genética , Bases de Dados Factuais , Histonas/genética , Humanos , Conformação de Ácido Nucleico , Reconhecimento Automatizado de Padrão , RNA Mensageiro/química , RNA Mensageiro/genética , Sensibilidade e Especificidade , Alinhamento de Sequência/estatística & dados numéricos
16.
Brief Bioinform ; 1(3): 236-49, 2000 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-11465035

RESUMO

The crucial role of the non-coding portion of genomes is now widely acknowledged. In particular, mRNA untranslated regions are involved in many post-transcriptional regulatory pathways that control mRNA localisation, stability and translation efficiency. A review is given of the most recent research works on the functional characterisation of eukaryotic mRNA untranslated regions. In order to make possible a systematic and detailed sequence analysis of mRNA untranslated regions (UTRs), a non-redundant database of metazoan mRNA untranslated sequences annotated for the occurrence of specific functional elements, UTRdb, was devised. These elements, whose consensus structure has been devised on the basis of experimental assays and of comparative analyses, have been collected in the UTRsite database. A suitable pattern-matching software has been devised to search UTRsite patterns in user-submitted sequences, also assessing their statistical significance. Structural, compositional and evolutionary features of untranslated sequences of metazoan mRNAs have been investigated showing peculiar intra- and interspecific patterns.


Assuntos
RNA Mensageiro/genética , Regiões não Traduzidas , Animais , Sequência de Bases , Biologia Computacional , Bases de Dados Factuais , Células Eucarióticas , Evolução Molecular , Humanos , Internet , Dados de Sequência Molecular , Biossíntese de Proteínas , Estabilidade de RNA , RNA Mensageiro/química , RNA Mensageiro/metabolismo
17.
Nucleic Acids Res ; 26(1): 192-5, 1998 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-9399833

RESUMO

The important role the untranslated regions of eukaryotic mRNAs may play in gene regulation and expression is now widely acknowledged. For this reason we developed UTRdb, a specialized database of 5'- and 3'-untranslated sequences of eukaryotic mRNAs cleaned from redundancy. UTRdb entries are enriched with specialized information not present in the primary databases, including the presence of functional patterns already demonstrated by experimental analysis to have some functional role. A collection of such patterns is being collected in UTRsite database (http://bio-www.ba.cnr.it:8000/srs5/) which can also be used with appropriate computational tools to detect known functional patterns contained in mRNA untranslated regions.


Assuntos
Bases de Dados Factuais , Biossíntese de Proteínas , RNA Mensageiro/genética , Animais , Redes de Comunicação de Computadores , Células Eucarióticas , Humanos , Armazenamento e Recuperação da Informação
18.
Comput Appl Biosci ; 12(1): 1-8, 1996 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-8670613

RESUMO

A key concept in comparing sequence collections is the issue of redundancy. The production of sequence collections free from redundancy is undoubtedly very useful, both in performing statistical analyses and accelerating extensive database searching on nucleotide sequences. Indeed, publicly available databases contain multiple entries of identical or almost identical sequences. Performing statistical analysis on such biased data makes the risk of assigning high significance to non-significant patterns very high. In order to carry out unbiased statistical analysis as well as more efficient database searching it is thus necessary to analyse sequence data that have been purged of redundancy. Given that a unambiguous definition of redundancy is impracticable for biological sequence data, in the present program a quantitative description of redundancy will be used, based on the measure of sequence similarity. A sequence is considered redundant if it shows a degree of similarity and overlapping with a longer sequence in the database greater than a threshold fixed by the user. In this paper we present a new algorithm based on an "approximate string matching' procedure, which is able to determine the overall degree of similarity between each pair of sequences contained in a nucleotide sequence database and to generate automatically nucleotide sequence collections free from redundancies.


Assuntos
Bases de Dados Factuais , Alinhamento de Sequência/métodos , Software , Algoritmos , Sequência de Bases , DNA/genética , Estudos de Avaliação como Assunto , Dados de Sequência Molecular , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência/métodos , Análise de Sequência/estatística & dados numéricos , Homologia de Sequência do Ácido Nucleico
19.
Comput Appl Biosci ; 9(5): 541-5, 1993 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-8293327

RESUMO

A new string searching algorithm is presented aimed at searching for the occurrence of character patterns in longer character texts. The algorithm, specifically designed for nucleic acid sequence data, is essentially derived from the Boyer-Moore method (Comm. ACM, 20, 762-772, 1977). Both pattern and text data are compressed so that the natural 4-letter alphabet of nucleic acid sequences is considerably enlarged. The string search starts from the last character of the pattern and proceeds in large jumps through the text to be searched. The data compression and searching algorithm allows one to avoid searching for patterns not present in the text as well as to inspect, for each pattern, all text characters until the exact match with the text is found. These considerations are supported by empirical evidence and comparisons with other methods.


Assuntos
Algoritmos , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , DNA/genética , Bases de Dados Factuais , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão , Análise de Sequência de DNA/estatística & dados numéricos
20.
Nucleic Acids Res ; 28(1): 159-62, 2000 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-10592210

RESUMO

The current version of PLMItRNA has been realized to constitute a database for tRNA molecules and genes identified in the mitochondria of all green plants ( Viridiplantae ). It is the enlargement of a previous database originally restricted to seed plants [Ceci,L.R., Volpicella,M., Liuni,S., Volpetti,V., Licciulli,F. and Gallerani,R. (1999) Nucleic Acids Res., 27, 156-157]. PLMItRNA reports information and multialignments on 254 genes and 16 tRNA molecules detected in 25 higher plants (one bryophyta and 24 vascular plants) and seven green algae. PLMItRNA is accessible via the WWW at http://bio-WWW.ba.cnr.it:8000/srs6/


Assuntos
Bases de Dados Factuais , Mitocôndrias/metabolismo , Plantas/genética , RNA de Transferência/genética , Plantas/classificação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA