Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Bioinformatics ; 23(16): 2063-72, 2007 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-17540679

RESUMEN

MOTIVATION: A major challenge in current biomedical research is the identification of cellular processes deregulated in a given pathology through the analysis of gene expression profiles. To this end, predefined lists of genes, coding specific functions, are compared with a list of genes ordered according to their values of differential expression measured by suitable univariate statistics. RESULTS: We propose a statistically well-founded method for measuring the relevance of predefined lists of genes and for assessing their statistical significance starting from their raw expression levels as recorded on the microarray. We use prediction accuracy as a measure of relevance of the list. The rationale is that a functional category, coded through a list of genes, is perturbed in a given pathology if it is possible to correctly predict the occurrence of the disease in new subjects on the basis of the expression levels of the genes belonging to the list only. The accuracy is estimated with multiple random validation strategy and its statistical significance is assessed against a couple of null hypothesis, by using two independent permutation tests. The utility of the proposed methodology is illustrated by analyzing the relevance of Gene Ontology terms belonging to biological process category in colon and prostate cancer, by using three different microarray data sets and by comparing it with current approaches. AVAILABILITY: Source code for the algorithms is available from author upon request. SUPPLEMENTARY INFORMATION: Colon cancer data set and a complete description of experimental results are available at: ftp://bioftp:76bioftpxxx@marx.ba.issia.cnr.it/supp-info.htm.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Familia de Multigenes , Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Interpretación Estadística de Datos , Humanos , Masculino , Proteínas de Neoplasias/clasificación
2.
BMC Bioinformatics ; 7: 387, 2006 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-16919171

RESUMEN

BACKGROUND: In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia--Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. RESULTS: We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed. CONCLUSIONS: The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required.


Asunto(s)
Algoritmos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Estadística como Asunto/métodos , Anciano , Neoplasias del Colon/clasificación , Neoplasias del Colon/genética , Interpretación Estadística de Datos , Femenino , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica/genética , Humanos , Masculino , Persona de Mediana Edad , Modelos Estadísticos , Análisis Numérico Asistido por Computador , Reproducibilidad de los Resultados , Programas Informáticos
3.
Nucleic Acids Res ; 29(1): 167-8, 2001 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-11125079

RESUMEN

The PLMItRNA database for mitochondrial tRNA molecules and genes in VIRIDIPLANTAE: (green plants) [Volpetti,V., Gallerani,R., DeBenedetto,C., Liuni,S., Licciulli,F. and Ceci,L.R. (2000) Nucleic Acids Res., 28, 159-162] has been enlarged to include algae. The database now contains 436 genes and 16 tRNA entries relative to 25 higher plants, eight green algae, four red algae (RHODOPHYTAE:) and two STRAMENOPILES: The PLMItRNA database is accessible via the WWW at http://bio-www.ba.cnr.it:8000/PLMItRNA.


Asunto(s)
ADN Mitocondrial/genética , Bases de Datos Factuales , Células Eucariotas/metabolismo , ARN de Transferencia/genética , Eucariontes/genética , Servicios de Información , Internet , Fotosíntesis , Plantas/genética
4.
Nucleic Acids Res ; 30(1): 347-8, 2002 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-11752333

RESUMEN

PLANT-PIs is a database developed to facilitate retrieval of information on plant protease inhibitors (PIs) and related genes. For each PI, links to sequence databases are reported together with a summary of the functional properties of the molecule (and its mutants) as deduced from literature. PLANT-PIs contains information for 351 plant PIs, plus several isoinhibitors. The database is accessible at http://bighost.area.ba.cnr.it/PLANT-PIs.


Asunto(s)
Bases de Datos de Proteínas , Genes de Plantas , Plantas/enzimología , Inhibidores de Proteasas/química , Secuencia de Aminoácidos , Sitios de Unión , Análisis Mutacional de ADN , ADN de Plantas/análisis , Expresión Génica , Almacenamiento y Recuperación de la Información , Internet , Proteínas de Plantas/química , Proteínas de Plantas/genética , Plantas/genética , Relación Estructura-Actividad
5.
Gene ; 205(1-2): 95-102, 1997 Dec 31.
Artículo en Inglés | MEDLINE | ID: mdl-9461382

RESUMEN

The important role of 5' and 3' untranslated regions of eukaryotic mRNAs in gene regulation and expression is now widely accepted. In order to study the general structural and compositional features of these sequences we developed UTRdb, a specialized database of 5' and 3'-UTR sequences from seven different taxonomic groups of eukaryotic mRNAs cleaned of redundancy. The analysis of the UTR sequences contained in this database showed that 5'-UTR sequences, on average 200 nucleotides long, are 1.5-3 times shorter than the corresponding 3'-UTR sequences in the various taxonomic groups considered here. As to their compositional properties on average 5'-UTR sequences resulted in all cases GC richer than 3'-UTR sequences, and significant correlations were found between the GC content of 5' and 3'-UTR sequences and the GC content of the third silent codon positions of the corresponding protein coding genes. The dinucleotide analysis showed a differential depletion of CpG in vertebrate 5' and 3'-UTR, with 5'-UTR sequences being more CpG-rich, and a generalized depletion of TpA in both 5' and 3'-UTR was observed in all eukaryotic sequence collections.


Asunto(s)
ARN Mensajero/genética , Composición de Base , Intrones , Biosíntesis de Proteínas
6.
Gene ; 261(1): 85-91, 2000 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-11164040

RESUMEN

The AUG start codon context features have been investigated by analyzing eukaryotic mRNAs belonging to various taxonomic groups. The functional relevance of each specific position surrounding the AUG start codon has been established as a function of the measured shift between base composition observed at that particular position, and base composition averaged over all the 5'untranslated regions. A more detailed analysis carried out on human genes belonging to different isochores showed significant isochore-specific fea-tures that cannot be explained only by a mutational bias effect. The most represented heptamers spanning from position -3 to +4 with respect to the initiator AUG have been determined for mRNAs belonging to different taxonomic groups and a web page utility has been set up (http://bigarea.area.ba.cnr.it:8000/BioWWW/ATG.html) to determine the relative abundance of a user submitted oligonucleotide context in a given species or taxon.


Asunto(s)
Codón Iniciador/genética , Células Eucariotas/metabolismo , ARN Mensajero/genética , Regiones no Traducidas 5'/genética , Regiones no Traducidas 5'/metabolismo , Animales , Composición de Base , Secuencia de Bases , Sitios de Unión , ADN/genética , Bases de Datos Factuales , Genes/genética , Genoma Humano , Humanos , Sistemas de Lectura Abierta/genética , Ribosomas/metabolismo
7.
Gene ; 276(1-2): 73-81, 2001 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-11591473

RESUMEN

The crucial role of the non-coding portion of genomes is now widely acknowledged. In particular, mRNA untranslated regions are involved in many post-transcriptional regulatory pathways that control mRNA localization, stability and translation efficiency. We review in this paper the major structural and compositional features of eukaryotic mRNA untranslated regions and provide some examples of bioinformatic analyses for their functional characterization.


Asunto(s)
Regiones no Traducidas 3'/genética , Regiones no Traducidas 5'/genética , Células Eucariotas/metabolismo , ARN Mensajero/genética , Animales , Composición de Base , Secuencia de Bases , Secuencia Conservada , Bases de Datos Factuales , Humanos , Intrones , Secuencias Reguladoras de Ácidos Nucleicos , Secuencias Repetitivas de Ácidos Nucleicos
8.
Biotechniques ; 25(1): 112-7, 120-3, 1998 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-9668985

RESUMEN

A computer program is presented that selects a small set of short primer pairs for PCR to sample all the sequences in a user-specified list of mRNAs. Such primer pairs could be used to increase the probability of sampling mRNAs of particular interest in differential display and to generate simplified hybridization probes for DNA chips or arrays. The program uses simulated PCR to find pairs of primers that sample more than one sequence in the list. A small set of such primer pairs is selected that give maximal coverage of the sequences in the list. Primer pairs are excluded that: (i) generate simulated PCR products of the same size from a number of sequences in the list, (ii) can easily form primer dimers, (iii) are outside a specified range of G + C content or (iv) occur in another list of undesirable sequences, such as rRNAs and Alu repeats. Five lists consisting of from 48-285 cDNA sequences were used to test the program. A small number of pairs of primers, 8-10 bases in length, were selected that fit the above criteria and that generate one or more simulated PCR products in all or most of the cDNAs in each list.


Asunto(s)
Cartilla de ADN/genética , Programas Informáticos , Secuencia de Bases , Biología Computacional , Cartilla de ADN/química , ADN Complementario/química , ADN Complementario/genética , Reacción en Cadena de la Polimerasa , Lenguajes de Programación
10.
Nucleic Acids Res ; 16(5): 1715-28, 1988 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-3281142

RESUMEN

This study describes a method for the backtranslation of an aminoacidic sequence, an extremely useful tool for various experimental approaches. It involves two computer programs CLUSTER and BACKTR written in Fortran 77 running on a VAX/VMS computer. CLUSTER generates a reliable codon usage table through a cluster analysis, based on a chi 2-like distance between the sequences. BACKTR produces backtranslated sequences according to different options when use is made of the codon usage table obtained in addition to selecting the least ambiguous potential oligonucleotide probes within an aminoacidic sequence. The method was tested by applying it to 158 yeast genes.


Asunto(s)
Secuencia de Aminoácidos , Secuencia de Bases , Codón/genética , Biosíntesis de Proteínas , ARN Mensajero/genética , Programas Informáticos/métodos , Genes Fúngicos , Sistemas de Información , Matemática , Saccharomyces cerevisiae/genética
11.
Protein Seq Data Anal ; 3(4): 327-34, 1990 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-2235975

RESUMEN

EMBL and GenBank keyword indexes have no hierarchical structure. In this paper we present a method for merging and reorganizing them in a tree structure whose primary roots are the keywords 'protein', 'DNA', 'RNA', and 'unclassified'. Synonymous keywords have been grouped together and erroneous keywords have been corrected. This taxonomic organization of keywords results in a more extensive and efficient retrieval which is further aided by "synonyms declaration". The tree has been produced using the computer programs GENPOINT and CREANET.


Asunto(s)
Secuencia de Bases , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información , Indización y Redacción de Resúmenes
12.
Nucleic Acids Res ; 18(13): 3745-52, 1990 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-2197595

RESUMEN

A method is proposed for the automatic detection of serial periodicities in a linear sequence. Its application to DNA subtelomeric sequences from two lower eukaryotes, P.falciparum and S.cerevisiae, reveals ordered patterns organised in hierarchical periodicities, not easily recognizable by other methods. The possible implications concerning the evolution of tandemly repetitive arrays are discussed in light of a model which involves, as successive steps, random repeat modification, the fusion of differently modified repeat versions into longer units, and the amplification of (and/or homogenization to) the more recent repeat units.


Asunto(s)
ADN de Hongos , ADN , Secuencias Repetitivas de Ácidos Nucleicos , Algoritmos , Animales , Secuencia de Bases , Métodos , Datos de Secuencia Molecular , Plasmodium falciparum/genética , Saccharomyces cerevisiae/genética
13.
Comput Chem ; 20(1): 141-4, 1996 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-8867845

RESUMEN

The important role that untranslated regions of mRNAs (UTR) may play in gene regulation and expression is now widely accepted. For this reason we developed UTRDB, a specialized database of 5'- and 3'-UTR of eukaryotic mRNAs cleaned of redundancy. This paper describes the composition and the general feature of UTRDB. The analysis of UTRDB by using suitable statistical methods could provide useful information for guiding the experimental work aimed at delucidating the role of UTR sequences in gene regulation and expression.


Asunto(s)
Biosíntesis de Proteínas/genética , ARN Mensajero/química , Algoritmos , Animales , Regulación de la Expresión Génica/genética , Proyecto Genoma Humano , Sistemas de Información , ARN Mensajero/genética , Análisis de Secuencia , Programas Informáticos
14.
Nucleic Acids Res ; 28(1): 153-4, 2000 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-10592208

RESUMEN

The AMmtDB database (http://bio-www.ba.cnr.it:8000/srs6/ ) has been updated by collecting the multi-aligned sequences of Chordata mitochondrial genes coding for proteins and tRNAs. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. AMmtDB data selected through SRS can be viewed and managed using GeneDoc or other programs for the management of multi-aligned data depending on the user's operative system. The multiple alignments have been produced with CLUSTALW and PILEUP programs and then carefully optimized manually.


Asunto(s)
Cordados no Vertebrados/genética , ADN Mitocondrial/genética , Bases de Datos Factuales , Animales , Internet , Alineación de Secuencia
15.
Comput Appl Biosci ; 9(5): 541-5, 1993 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-8293327

RESUMEN

A new string searching algorithm is presented aimed at searching for the occurrence of character patterns in longer character texts. The algorithm, specifically designed for nucleic acid sequence data, is essentially derived from the Boyer-Moore method (Comm. ACM, 20, 762-772, 1977). Both pattern and text data are compressed so that the natural 4-letter alphabet of nucleic acid sequences is considerably enlarged. The string search starts from the last character of the pattern and proceeds in large jumps through the text to be searched. The data compression and searching algorithm allows one to avoid searching for patterns not present in the text as well as to inspect, for each pattern, all text characters until the exact match with the text is found. These considerations are supported by empirical evidence and comparisons with other methods.


Asunto(s)
Algoritmos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Secuencia de Bases , ADN/genética , Bases de Datos Factuales , Datos de Secuencia Molecular , Reconocimiento de Normas Patrones Automatizadas , Análisis de Secuencia de ADN/estadística & datos numéricos
16.
Brief Bioinform ; 1(3): 236-49, 2000 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-11465035

RESUMEN

The crucial role of the non-coding portion of genomes is now widely acknowledged. In particular, mRNA untranslated regions are involved in many post-transcriptional regulatory pathways that control mRNA localisation, stability and translation efficiency. A review is given of the most recent research works on the functional characterisation of eukaryotic mRNA untranslated regions. In order to make possible a systematic and detailed sequence analysis of mRNA untranslated regions (UTRs), a non-redundant database of metazoan mRNA untranslated sequences annotated for the occurrence of specific functional elements, UTRdb, was devised. These elements, whose consensus structure has been devised on the basis of experimental assays and of comparative analyses, have been collected in the UTRsite database. A suitable pattern-matching software has been devised to search UTRsite patterns in user-submitted sequences, also assessing their statistical significance. Structural, compositional and evolutionary features of untranslated sequences of metazoan mRNAs have been investigated showing peculiar intra- and interspecific patterns.


Asunto(s)
ARN Mensajero/genética , Regiones no Traducidas , Animales , Secuencia de Bases , Biología Computacional , Bases de Datos Factuales , Células Eucariotas , Evolución Molecular , Humanos , Internet , Datos de Secuencia Molecular , Biosíntesis de Proteínas , Estabilidad del ARN , ARN Mensajero/química , ARN Mensajero/metabolismo
17.
Bioinformatics ; 16(5): 439-50, 2000 May.
Artículo en Inglés | MEDLINE | ID: mdl-10871266

RESUMEN

MOTIVATION: The identification of sequence patterns involved in gene regulation and expression is a major challenge in molecular biology. In this paper we describe a novel algorithm and the software for searching nucleotide and protein sequences for complex nucleotide patterns including potential secondary structure elements, also allowing for mismatches/mispairings below a user-fixed threshold, and assessing the statistical significance of their occurrence through a Markov chain simulation. RESULTS: The application of the proposed algorithm allowed the identification of some functional elements, such as the Iron Responsive Element, the Histone stem-loop structure and the Selenocysteine Insertion Sequence, located in the mRNA untranslated regions of post-transcriptionally regulated genes with the assessment of sensitivity and selectivity of the searching method. AVAILABILITY: A Web interface is available at: http://bigarea.area.ba.cnr.it:8000/EmbIT/Pats earch.html.


Asunto(s)
Alineación de Secuencia/métodos , Programas Informáticos , Regiones no Traducidas 3' , Regiones no Traducidas 5' , Algoritmos , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Simulación por Computador , ADN/química , ADN/genética , Bases de Datos Factuales , Histonas/genética , Humanos , Conformación de Ácido Nucleico , Reconocimiento de Normas Patrones Automatizadas , ARN Mensajero/química , ARN Mensajero/genética , Sensibilidad y Especificidad , Alineación de Secuencia/estadística & datos numéricos
18.
Nucleic Acids Res ; 26(1): 192-5, 1998 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-9399833

RESUMEN

The important role the untranslated regions of eukaryotic mRNAs may play in gene regulation and expression is now widely acknowledged. For this reason we developed UTRdb, a specialized database of 5'- and 3'-untranslated sequences of eukaryotic mRNAs cleaned from redundancy. UTRdb entries are enriched with specialized information not present in the primary databases, including the presence of functional patterns already demonstrated by experimental analysis to have some functional role. A collection of such patterns is being collected in UTRsite database (http://bio-www.ba.cnr.it:8000/srs5/) which can also be used with appropriate computational tools to detect known functional patterns contained in mRNA untranslated regions.


Asunto(s)
Bases de Datos Factuales , Biosíntesis de Proteínas , ARN Mensajero/genética , Animales , Redes de Comunicación de Computadores , Células Eucariotas , Humanos , Almacenamiento y Recuperación de la Información
19.
Comput Appl Biosci ; 12(1): 1-8, 1996 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-8670613

RESUMEN

A key concept in comparing sequence collections is the issue of redundancy. The production of sequence collections free from redundancy is undoubtedly very useful, both in performing statistical analyses and accelerating extensive database searching on nucleotide sequences. Indeed, publicly available databases contain multiple entries of identical or almost identical sequences. Performing statistical analysis on such biased data makes the risk of assigning high significance to non-significant patterns very high. In order to carry out unbiased statistical analysis as well as more efficient database searching it is thus necessary to analyse sequence data that have been purged of redundancy. Given that a unambiguous definition of redundancy is impracticable for biological sequence data, in the present program a quantitative description of redundancy will be used, based on the measure of sequence similarity. A sequence is considered redundant if it shows a degree of similarity and overlapping with a longer sequence in the database greater than a threshold fixed by the user. In this paper we present a new algorithm based on an "approximate string matching' procedure, which is able to determine the overall degree of similarity between each pair of sequences contained in a nucleotide sequence database and to generate automatically nucleotide sequence collections free from redundancies.


Asunto(s)
Bases de Datos Factuales , Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Secuencia de Bases , ADN/genética , Estudios de Evaluación como Asunto , Datos de Secuencia Molecular , Alineación de Secuencia/estadística & datos numéricos , Análisis de Secuencia/métodos , Análisis de Secuencia/estadística & datos numéricos , Homología de Secuencia de Ácido Nucleico
20.
Nucleic Acids Res ; 28(1): 159-62, 2000 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-10592210

RESUMEN

The current version of PLMItRNA has been realized to constitute a database for tRNA molecules and genes identified in the mitochondria of all green plants ( Viridiplantae ). It is the enlargement of a previous database originally restricted to seed plants [Ceci,L.R., Volpicella,M., Liuni,S., Volpetti,V., Licciulli,F. and Gallerani,R. (1999) Nucleic Acids Res., 27, 156-157]. PLMItRNA reports information and multialignments on 254 genes and 16 tRNA molecules detected in 25 higher plants (one bryophyta and 24 vascular plants) and seven green algae. PLMItRNA is accessible via the WWW at http://bio-WWW.ba.cnr.it:8000/srs6/


Asunto(s)
Bases de Datos Factuales , Mitocondrias/metabolismo , Plantas/genética , ARN de Transferencia/genética , Plantas/clasificación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA