Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
BMC Genomics ; 12: 166, 2011 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-21450072

RESUMEN

BACKGROUND: The typical objective of Genome-wide association (GWA) studies is to identify single-nucleotide polymorphisms (SNPs) and corresponding genes with the strongest evidence of association (the 'most-significant SNPs/genes' approach). Borrowing ideas from micro-array data analysis, we propose a new method, named RS-SNP, for detecting sets of genes enriched in SNPs moderately associated to the phenotype. RS-SNP assesses whether the number of significant SNPs, with p-value P ≤ α, belonging to a given SNP set S is statistically significant. The rationale of proposed method is that two kinds of null hypotheses are taken into account simultaneously. In the first null model the genotype and the phenotype are assumed to be independent random variables and the null distribution is the probability of the number of significant SNPs in S greater than observed by chance. The second null model assumes the number of significant SNPs in S depends on the size of and not on the identity of the SNPs in . Statistical significance is assessed using non-parametric permutation tests. RESULTS: We applied RS-SNP to the Crohn's disease (CD) data set collected by the Wellcome Trust Case Control Consortium (WTCCC) and compared the results with GENGEN, an approach recently proposed in literature. The enrichment analysis using RS-SNP and the set of pathways contained in the MSigDB C2 CP pathway collection highlighted 86 pathways rich in SNPs weakly associated to CD. Of these, 47 were also indicated to be significant by GENGEN. Similar results were obtained using the MSigDB C5 pathway collection. Many of the pathways found to be enriched by RS-SNP have a well-known connection to CD and often with inflammatory diseases. CONCLUSIONS: The proposed method is a valuable alternative to other techniques for enrichment analysis of SNP sets. It is well founded from a theoretical and statistical perspective. Moreover, the experimental comparison with GENGEN highlights that it is more robust with respect to false positive findings.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Enfermedad de Crohn/genética , Genoma Humano , Humanos , Desequilibrio de Ligamiento , Fenotipo
2.
J Biomed Inform ; 43(3): 397-406, 2010 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-19796710

RESUMEN

One of the major problems in genomics and medicine is the identification of gene networks and pathways deregulated in complex and polygenic diseases, like cancer. In this paper, we address the problem of assessing the variability of results of pathways analysis identified in different and independent genome wide expression studies, in which the same phenotypic conditions are assayed. To this end, we assessed the deregulation of 1891 curated gene sets in four independent gene expression data sets of subjects affected by colorectal cancer (CRC). In this comparison we used two well-founded statistical models for evaluating deregulation of gene networks. We found that the results of pathway analysis in expression studies are highly reproducible. Our study revealed 53 pathways identified by the two methods in all the four data sets analyzed with high statistical significance and strong biological relevance with the pathology examined. This set of pathways associated to single markers as well as to whole biological processes altered constitutes a signature of the disease which sheds light on the genetics bases of CRC.


Asunto(s)
Neoplasias Colorrectales/genética , Genoma Humano , Genómica/métodos , Neoplasias Colorrectales/metabolismo , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Modelos Estadísticos , Reproducibilidad de los Resultados
3.
BMC Bioinformatics ; 10: 275, 2009 Sep 02.
Artículo en Inglés | MEDLINE | ID: mdl-19725948

RESUMEN

BACKGROUND: The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited. RESULTS: The simulation study highlights that none of the three method outperforms all others consistently. GSEA and RS are able to detect weak signals of deregulation and they perform differently when genes in a gene set are both differentially up and down regulated. GLAPA is more conservative and large differences between the two phenotypes are required to allow the method to detect differential deregulation in gene sets. This is due to the fact that the enrichment statistic in GLAPA is prediction error which is a stronger criteria than classical two sample statistic as used in RS and GSEA. This was reflected in the analysis on real data sets as GSEA and RS were seen to be significant for particular gene sets while GLAPA was not, suggesting a small effect size. We find that the rank of gene set enrichment induced by GLAPA is more similar to RS than GSEA. More importantly, the rankings of the three methods share significant overlap. CONCLUSION: The three methods considered in our study recover relevant gene sets known to be deregulated in the experimental conditions and pathologies analyzed. There are differences between the three methods and GSEA seems to be more consistent in finding enriched gene sets, although no method uniformly dominates over all data sets. Our analysis highlights the deep difference existing between associative and predictive methods for detecting enrichment and the use of both to better interpret results of pathway analysis. We close with suggestions for users of gene set methods.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Algoritmos , Bases de Datos Genéticas , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Fenotipo
4.
BMC Bioinformatics ; 10 Suppl 6: S2, 2009 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-19534745

RESUMEN

BACKGROUND: The identification of protein coding elements in sets of mammalian conserved elements is one of the major challenges in the current molecular biology research. Many features have been proposed for automatically distinguishing coding and non coding conserved sequences, making so necessary a systematic statistical assessment of their differences. A comprehensive study should be composed of an association study, i.e. a comparison of the distributions of the features in the two classes, and a prediction study in which the prediction accuracies of classifiers trained on single and groups of features are analyzed, conditionally to the compared species and to the sequence lengths. RESULTS: In this paper we compared distributions of a set of comparative and non comparative features and evaluated the prediction accuracy of classifiers trained for discriminating sequence elements conserved among human, mouse and rat species. The association study showed that the analyzed features are statistically different in the two classes. In order to study the influence of the sequence lengths on the feature performances, a predictive study was performed on different data sets composed of coding and non coding alignments in equal number and equally long with an ascending average length. We found that the most discriminant feature was a comparative measure indicating the proportion of synonymous nucleotide substitutions per synonymous sites. Moreover, linear discriminant classifiers trained by using comparative features in general outperformed classifiers based on intrinsic ones. Finally, the prediction accuracy of classifiers trained on comparative features increased significantly by adding intrinsic features to the set of input variables, independently on sequence length (Kolmogorov-Smirnov P-value

Asunto(s)
Secuencia Conservada , Sistemas de Lectura Abierta , Proteínas/química , Animales , Secuencia de Bases , Genómica , Humanos , Ratones , Ratas , Análisis de Secuencia , Especificidad de la Especie
5.
Sci Total Environ ; 672: 763-775, 2019 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-30974366

RESUMEN

In fluvial basin analysis, sediment connectivity is an important element for defining channel dynamics. Nevertheless, although several approaches to quantify this concept have been trialed, there is considerable discussion about ways to measure and assess sediment connectivity. The present study investigates sediment connectivity through the definition of a new index, aiming to integrate functional aspects within a structural component. Our objective is to produce a sediment flow connectivity index (SCI) map, directly applicable to monitoring and management activities. Our SCI is defined as the result of the gradient-based flow accumulation of a sediment mobility index, which is in turn a simple function of rainfall, geotechnical properties of soil and land use. This method is here applied to the Vernazza basin (eastern Liguria, Italy), producing a sediment connectivity map that shows good performance in predicting the positions and accumulation paths of mobilized deposits detected on the ground after the October 25th, 2011, flood event. A further evaluation of the proposed index is performed through a comparison of the maps derived using the SCI and connectivity index (IC) developed by Cavalli et al. (2013), which highlights comparable quantitative overall performances, together with a slightly better qualitative identification of subtle sediment flow paths by the SCI. In spite of current limitations due to, e.g., the local nature of the final index, the availability of input information through open global datasets promises the potential application of this method to larger-scale assessments, paying attention to properly addressing upscaling and standardization issues.

6.
Artif Intell Med ; 40(1): 29-44, 2007 May.
Artículo en Inglés | MEDLINE | ID: mdl-16920342

RESUMEN

MOTIVATIONS: One of the main problems in cancer diagnosis by using DNA microarray data is selecting genes relevant for the pathology by analyzing their expression profiles in tissues in two different phenotypical conditions. The question we pose is the following: how do we measure the relevance of a single gene in a given pathology? METHODS: A gene is relevant for a particular disease if we are able to correctly predict the occurrence of the pathology in new patients on the basis of its expression level only. In other words, a gene is informative for the disease if its expression levels are useful for training a classifier able to generalize, that is, able to correctly predict the status of new patients. In this paper we present a selection bias free, statistically well founded method for finding relevant genes on the basis of their classification ability. RESULTS: We applied the method on a colon cancer data set and produced a list of relevant genes, ranked on the basis of their prediction accuracy. We found, out of more than 6500 available genes, 54 overexpressed in normal tissues and 77 overexpressed in tumor tissues having prediction accuracy greater than 70% with p-value

Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias del Colon/diagnóstico , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Pruebas Genéticas , Análisis de Secuencia por Matrices de Oligonucleótidos , Neoplasias del Colon/genética , Humanos , Modelos Genéticos , Modelos Estadísticos , Valor Predictivo de las Pruebas , Pronóstico
7.
BMC Bioinformatics ; 6 Suppl 4: S2, 2005 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-16351746

RESUMEN

BACKGROUND: The advent of the technology of DNA microarrays constitutes an epochal change in the classification and discovery of different types of cancer because the information provided by DNA microarrays allows an approach to the problem of cancer analysis from a quantitative rather than qualitative point of view. Cancer classification requires well founded mathematical methods which are able to predict the status of new specimens with high significance levels starting from a limited number of data. In this paper we assess the performances of Regularized Least Squares (RLS) classifiers, originally proposed in regularization theory, by comparing them with Support Vector Machines (SVM), the state-of-the-art supervised learning technique for cancer classification by DNA microarray data. The performances of both approaches have been also investigated with respect to the number of selected genes and different gene selection strategies. RESULTS: We show that RLS classifiers have performances comparable to those of SVM classifiers as the Leave-One-Out (LOO) error evaluated on three different data sets shows. The main advantage of RLS machines is that for solving a classification problem they use a linear system of order equal to either the number of features or the number of training examples. Moreover, RLS machines allow to get an exact measure of the LOO error with just one training. CONCLUSION: RLS classifiers are a valuable alternative to SVM classifiers for the problem of cancer classification by gene expression data, due to their simplicity and low computational complexity. Moreover, RLS classifiers show generalization ability comparable to the ones of SVM classifiers also in the case the classification of new specimens involves very few gene expression levels.


Asunto(s)
Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica , Análisis de los Mínimos Cuadrados , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Análisis por Conglomerados , Simulación por Computador , Análisis Discriminante , Perfilación de la Expresión Génica , Humanos , Modelos Genéticos , Modelos Estadísticos , Dinámicas no Lineales , Reconocimiento de Normas Patrones Automatizadas , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Alineación de Secuencia , Programas Informáticos
8.
Dig Liver Dis ; 43(8): 623-31, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-21411385

RESUMEN

BACKGROUND: A meta-analysis has re-analysed previous genome-wide association scanning definitively confirming eleven genes and further identifying 21 new loci. However, the identified genes/loci still explain only the minority of genetic predisposition of Crohn's disease. AIMS: To identify genes weakly involved in disease predisposition by analysing chromosomal regions enriched of single nucleotide polymorphisms with modest statistical association. METHODS: We utilized the WTCCC data set evaluating 1748 CD and 2938 controls. The identification of candidate genes/loci was performed by a two-step procedure: first of all chromosomal regions enriched of weak association signals were localized; subsequently, weak signals clustered in gene regions were identified. The statistical significance was assessed by non parametric permutation tests. RESULTS: The cytoband enrichment analysis highlighted 44 regions (P≤0.05) enriched with single nucleotide polymorphisms significantly associated with the trait including 23 out of 31 previously confirmed and replicated genes. Importantly, we highlight further 20 novel chromosomal regions carrying approximately one hundred genes/loci with modest association. Amongst these we find compelling functional candidate genes such as MAPT, GRB2 and CREM, LCT, and IL12RB2. CONCLUSION: Our study suggests a different statistical perspective to discover genes weakly associated with a given trait, although further confirmatory functional studies are needed.


Asunto(s)
Enfermedad de Crohn/genética , Predisposición Genética a la Enfermedad , Variación Genética , Polimorfismo de Nucleótido Simple/genética , Regiones no Traducidas 3' , Regiones no Traducidas 5' , Estudio de Asociación del Genoma Completo , Humanos , Intrones
9.
Artif Intell Med ; 46(2): 131-8, 2009 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-18804983

RESUMEN

MOTIVATIONS: A large number of single nucleotide polymorphisms (SNPs) are supposed to be involved in onset, differentiation and development of complex diseases. Univariate analysis is limited in studying complex traits since does not take into account gene-gene interaction, and the correlation of multiple SNPs with a specific phenotype. Moreover it might underestimate gene variants with weaker genetic contribution. Therefore more sophisticated techniques should be adopted when investigating the role of a panel of genetic markers in disease predisposition. METHODS: In this paper we describe a general method to simultaneously investigate the association between SNPs profile and Crohn's disease (CD), by evaluating the susceptibility or protective role of single or groups of markers. As an association measure we adopted a weighted linear combination of SNPs in which suitable weighting vectors belonged to predefined and over-complete vocabularies of vectors (frames), or were determined by the data. RESULTS: The proposed method found a weighted linear combination of SNPs statistically associated to CD (p=3.81 x 10(-10)) describing the role of the markers in the pathology. In particular, MCP1-A2518G gave the major contribution as protective locus, similarly to TNF-alpha-C857T, DLG5 rs124869, PTPN22 C1858T variants. The NF kappaB -94ATTG variants was found to be irrelevant for CD. For the remaining markers, a susceptibility role was attributed also confirming that markers on CARD15 gene, in particular G908R and L1007fsinsC, are involved with CD to the same extent as FcGIIIA G559T and TNF-alpha-G308A. Moreover, an odds ratio of 3.99(p<1.0 x 10(-4)) was assigned to this combination which is greater than the best odds ratio found in the single SNP analysis. CONCLUSIONS: Our methodology allowed to statistically measure the association of a panel of SNPs with a specific phenotype. Therefore this approach could be suitable for a population screening program with simultaneous evaluation of a large set of gene polymorphism.


Asunto(s)
Enfermedad de Crohn/genética , Epistasis Genética , Perfilación de la Expresión Génica , Polimorfismo de Nucleótido Simple , Humanos
10.
BMC Med Genomics ; 2: 11, 2009 Mar 03.
Artículo en Inglés | MEDLINE | ID: mdl-19257893

RESUMEN

BACKGROUND: Aberrant DNA methylation of CpG islands of cancer-related genes is among the earliest and most frequent alterations in cancerogenesis and might be of value for either diagnosing cancer or evaluating recurrent disease. This mechanism usually leads to inactivation of tumour-suppressor genes. We have designed the current study to validate our previous microarray data and to identify novel hypermethylated gene promoters. METHODS: The validation assay was performed in a different set of 8 patients with colorectal cancer (CRC) by means quantitative reverse-transcriptase polymerase chain reaction analysis. The differential RNA expression profiles of three CRC cell lines before and after 5-aza-2'-deoxycytidine treatment were compared to identify the hypermethylated genes. The DNA methylation status of these genes was evaluated by means of bisulphite genomic sequencing and methylation-specific polymerase chain reaction (MSP) in the 3 cell lines and in tumour tissues from 30 patients with CRC. RESULTS: Data from our previous genome search have received confirmation in the new set of 8 patients with CRC. In this validation set six genes showed a high induction after drug treatment in at least two of three CRC cell lines. Among them, the N-myc downstream-regulated gene 2 (NDRG2) promoter was found methylated in all CRC cell lines. NDRG2 hypermethylation was also detected in 8 out of 30 (27%) primary CRC tissues and was significantly associated with advanced AJCC stage IV. Normal colon tissues were not methylated. CONCLUSION: The findings highlight the usefulness of combining gene expression patterns and epigenetic data to identify tumour biomarkers, and suggest that NDRG2 silencing might bear influence on tumour invasiveness, being associated with a more advanced stage.

11.
Int J Biol Sci ; 4(6): 368-78, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-18953405

RESUMEN

Gene expression profiling offers a great opportunity for studying multi-factor diseases and for understanding the key role of genes in mechanisms which drive a normal cell to a cancer state. Single gene analysis is insufficient to describe the complex perturbations responsible for cancer onset, progression and invasion. A deeper understanding of the mechanisms of tumorigenesis can be reached focusing on deregulation of gene sets or pathways rather than on individual genes. We apply two known and statistically well founded methods for finding pathways and biological processes deregulated in pathological conditions by analyzing gene expression profiles. In particular, we measure the amount of deregulation and assess the statistical significance of predefined pathways belonging to a curated collection (Molecular Signature Database) in a colon cancer data set. We find that pathways strongly involved in different tumors are strictly connected with colon cancer. Moreover, our experimental results show that the study of complex diseases through pathway analysis is able to highlight genes weakly connected to the phenotype which may be difficult to detect by using classical univariate statistics. Our study shows the importance of using gene sets rather than single genes for understanding the main biological processes and pathways involved in colorectal cancer. Our analysis evidences that many of the genes involved in these pathways are strongly associated to colorectal tumorigenesis. In this new perspective, the focus shifts from finding differentially expressed genes to identifying biological processes, cellular functions and pathways perturbed in the phenotypic conditions by analyzing genes co-expressed in a given pathway as a whole, taking into account the possible interactions among them and, more importantly, the correlation of their expression with the phenotypical conditions.


Asunto(s)
Neoplasias del Colon/genética , Regulación Neoplásica de la Expresión Génica , Genes Relacionados con las Neoplasias/genética , Anciano , Neoplasias del Colon/metabolismo , Femenino , Perfilación de la Expresión Génica , Humanos , Masculino , Persona de Mediana Edad , Análisis de Secuencia por Matrices de Oligonucleótidos , Transducción de Señal
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA