Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Bioinformatics ; 26(14): 1752-8, 2010 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-20505004

RESUMO

MOTIVATION: Genome-wide association (GWA) studies have proven to be a successful approach for helping unravel the genetic basis of complex genetic diseases. However, the identified associations are not well suited for disease prediction, and only a modest portion of the heritability can be explained for most diseases, such as Type 2 diabetes or Crohn's disease. This may partly be due to the low power of standard statistical approaches to detect gene-gene and gene-environment interactions when small marginal effects are present. A promising alternative is Random Forests, which have already been successfully applied in candidate gene analyses. Important single nucleotide polymorphisms are detected by permutation importance measures. To this day, the application to GWA data was highly cumbersome with existing implementations because of the high computational burden. RESULTS: Here, we present the new freely available software package Random Jungle (RJ), which facilitates the rapid analysis of GWA data. The program yields valid results and computes up to 159 times faster than the fastest alternative implementation, while still maintaining all options of other programs. Specifically, it offers the different permutation importance measures available. It includes new options such as the backward elimination method. We illustrate the application of RJ to a GWA of Crohn's disease. The most important single nucleotide polymorphisms (SNPs) validate recent findings in the literature and reveal potential interactions. AVAILABILITY: The RJ software package is freely available at http://www.randomjungle.org


Assuntos
Estudo de Associação Genômica Ampla , Genômica/métodos , Software , Doença de Crohn/genética , Predisposição Genética para Doença , Genoma Humano , Humanos , Polimorfismo de Nucleotídeo Único
2.
Circulation ; 117(13): 1675-84, 2008 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-18362232

RESUMO

BACKGROUND: Recently, genome-wide association studies identified variants on chromosome 9p21.3 as affecting the risk of coronary artery disease (CAD). We investigated the association of this locus with CAD in 7 case-control studies and undertook a meta-analysis. METHODS AND RESULTS: A single-nucleotide polymorphism (SNP), rs1333049, representing the 9p21.3 locus, was genotyped in 7 case-control studies involving a total of 4645 patients with myocardial infarction or CAD and 5177 controls. The mode of inheritance was determined. In addition, in 5 of the 7 studies, we genotyped 3 additional SNPs to assess a risk-associated haplotype (ACAC). Finally, a meta-analysis of the present data and previously published samples was conducted. A limited fine mapping of the locus was performed. The risk allele (C) of the lead SNP, rs1333049, was uniformly associated with CAD in each study (P<0.05). In a pooled analysis, the odds ratio per copy of the risk allele was 1.29 (95% confidence interval, 1.22 to 1.37; P=0.0001). Haplotype analysis further suggested that this effect was not homogeneous across the haplotypic background (test for interaction, P=0.0079). An autosomal-additive mode of inheritance best explained the underlying association. The meta-analysis of the rs1333049 SNP in 12,004 cases and 28,949 controls increased the overall level of evidence for association with CAD to P=6.04x10(-10) (odds ratio, 1.24; 95% confidence interval, 1.20 to 1.29). Genotyping of 31 additional SNPs in the region identified several with a highly significant association with CAD, but none had predictive information beyond that of the rs1333049 SNP. CONCLUSIONS: This broad replication provides unprecedented evidence for association between genetic variants at chromosome 9p21.3 and risk of CAD.


Assuntos
Cromossomos Humanos Par 9/genética , Doença da Artéria Coronariana/genética , Variação Genética , Polimorfismo de Nucleotídeo Único/genética , Sequências Repetitivas de Ácido Nucleico/genética , Idoso , Estudos de Casos e Controles , Doença da Artéria Coronariana/epidemiologia , Feminino , Marcadores Genéticos/genética , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos , Fatores de Risco
3.
Bioinformatics ; 24(1): 146-8, 2008 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-18024970

RESUMO

UNLABELLED: For the analysis of complex polygenic diseases, one does not expect all patients to share the same disease-associated alleles. Not even will disease-causing variations be assigned to the identical sets of genes between patients. However, one does expect overlaps in the sets of genes that are involved and even more so in their assigned molecular processes. Furthermore, the assignment of single nucleotide polymorphisms (SNPs) to genes is highly ambiguous for intergenic SNPs. The tool presented here hence adds external information, i.e. GeneOntology (GO) terms (Gene Ontology Consortium), to the analysis of SNP data. AVAILABILITY: A web interface and source code are offered at https://webtools.imbs.uni-luebeck.de/snptogo


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Polimorfismo de Nucleotídeo Único/genética , Software , Interface Usuário-Computador , Análise Mutacional de DNA
4.
BMC Proc ; 3 Suppl 7: S65, 2009 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-20018059

RESUMO

Genome-wide association studies (GWAS) have helped to reveal genetic mechanisms of complex diseases. Although commonly used genotyping technology enables us to determine up to a million single-nucleotide polymorphisms (SNPs), causative variants are typically not genotyped directly. A favored approach to increase the power of genome-wide association studies is to impute the untyped SNPs using more complete genotype data of a reference population.Random forests (RF) provides an internal method for replacing missing genotypes. A forest of classification trees is used to determine similarities of probands regarding their genotypes. These proximities are then used to impute genotypes of untyped SNPs.We evaluated this approach using genotype data of the Framingham Heart Study provided as Problem 2 for Genetic Analysis Workshop 16 and the Caucasian HapMap samples as reference population. Our results indicate that RFs are faster but less accurate than alternative approaches for imputing untyped SNPs.

5.
BMC Proc ; 3 Suppl 7: S58, 2009 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-20018051

RESUMO

Genome-wide association studies have become standard in genetic epidemiology. Analyzing hundreds of thousands of markers simultaneously imposes some challenges for statisticians. One issue is the problem of multiplicity, which has been compared with the search for the needle in a haystack. To reduce the number of false-positive findings, a number of quality filters such as exclusion of single-nucleotide polymorphisms (SNPs) with a high missing fraction are employed. Another filter is exclusion of SNPs for which the calling algorithm had difficulties in assigning the genotypes. The only way to do this is the visual inspection of the cluster plots, also termed signal intensity plots, but this approach is often neglected. We developed an algorithm ACPA (automated cluster plot analysis), which performs this task automatically for autosomal SNPs. It is based on counting samples that lie too close to the cluster of a different genotype; SNPs are excluded when a certain threshold is exceeded. We evaluated ACPA using 1,000 randomly selected quality controlled SNPs from the Framingham Heart Study data that were provided for the Genetic Analysis Workshop 16. We compared the decision of ACPA with the decision made by two independent readers. We achieved a sensitivity of 88% (95% CI: 81%-93%) and a specificity of 86% (95% CI: 83%-89%). In a screening setting in which one aims at not losing any good SNP, we achieved 99% (95% CI: 98%-100%) specificity and still detected every second low-quality SNP.

6.
Nat Genet ; 41(3): 280-2, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19198612

RESUMO

We present a three-stage analysis of genome-wide SNP data in 1,222 German individuals with myocardial infarction and 1,298 controls, in silico replication in three additional genome-wide datasets of coronary artery disease (CAD) and subsequent replication in approximately 25,000 subjects. We identified one new CAD risk locus on 3q22.3 in MRAS (P = 7.44 x 10(-13); OR = 1.15, 95% CI = 1.11-1.19), and suggestive association with a locus on 12q24.31 near HNF1A-C12orf43 (P = 4.81 x 10(-7); OR = 1.08, 95% CI = 1.05-1.11).


Assuntos
Cromossomos Humanos Par 3 , Doença da Artéria Coronariana/genética , Predisposição Genética para Doença , Locos de Características Quantitativas , Estudos de Casos e Controles , Estudo de Associação Genômica Ampla , Alemanha , Fator 1-alfa Nuclear de Hepatócito/genética , Humanos , Polimorfismo de Nucleotídeo Único , Proteínas ras/genética
7.
BMC Proc ; 1 Suppl 1: S59, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18466559

RESUMO

With the development of high-throughput single-nucleotide polymorphism (SNP) technologies, the vast number of SNPs in smaller samples poses a challenge to the application of classical statistical procedures. A possible solution is to use a two-stage approach for case-control data in which, in the first stage, a screening test selects a small number of SNPs for further analysis. The second stage then estimates the effects of the selected variables using logistic regression (logReg). Here, we introduce a novel approach in which the selection of SNPs is based on the permutation importance estimated by random forests (RFs). For this, we used the simulated data provided for the Genetic Analysis Workshop 15 without knowledge of the true model.The data set was randomly split into a first and a second data set. In the first stage, RFs were grown to pre-select the 37 most important variables, and these were reduced to 32 variables by haplotype tagging. In the second stage, we estimated parameters using logReg.The highest effect estimates were obtained for five simulated loci. We detected smoking, gender, and the parental DR alleles as covariates. After correction for multiple testing, we identified two out of four genes simulated with a direct effect on rheumatoid arthritis risk and all covariates without any false positive.We showed that a two-staged approach with a screening of SNPs by RFs is suitable to detect candidate SNPs in genome-wide association studies for complex diseases.

8.
BMC Proc ; 1 Suppl 1: S9, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18466593

RESUMO

Mutual information (MI) is a robust nonparametric statistical approach for identifying associations between genotypes and gene expression levels. Using the data of Problem 1 provided for the Genetic Analysis Workshop 15, we first compared a quantitative MI (Tsalenko et al. 2006 J Bioinform Comput Biol 4:259-4) with the standard analysis of variance (ANOVA) and the nonparametric Kruskal-Wallis (KW) test. We then proposed a novel feature selection approach using MI in a classification scenario to address the small n - large p problem and compared it with a feature selection that relies on an asymptotic chi2 distribution. In both applications, we used a permutation-based approach for evaluating the significance of MI. Substantial discrepancies in significance were observed between MI, ANOVA, and KW that can be explained by different empirical distributions of the data. In contrast to ANOVA and KW, MI detects shifts in location when the data are non-normally distributed, skewed, or contaminated with outliers. ANOVA but not MI is often significant if one genotype with a small frequency had a remarkable difference in the average gene expression level relative to the other two genotypes. MI depends on genotype frequencies and cannot detect these differences. In the classification scenario, we show that our novel approach for feature selection identifies a smaller list of markers with higher accuracy compared to the standard method. In conclusion, permutation-based MI approaches provide reliable and flexible statistical frameworks which seem to be well suited for data that are non-normal, skewed, or have an otherwise peculiar distribution. They merit further methodological investigation.

9.
Genet Epidemiol ; 31 Suppl 1: S51-60, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18046765

RESUMO

Genome-wide association studies using thousands to hundreds of thousands of single nucleotide polymorphism (SNP) markers and region-wide association studies using a dense panel of SNPs are already in use to identify disease susceptibility genes and to predict disease risk in individuals. Because these tasks become increasingly important, three different data sets were provided for the Genetic Analysis Workshop 15, thus allowing examination of various novel and existing data mining methods for both classification and identification of disease susceptibility genes, gene by gene or gene by environment interaction. The approach most often applied in this presentation group was random forests because of its simplicity, elegance, and robustness. It was used for prediction and for screening for interesting SNPs in a first step. The logistic tree with unbiased selection approach appeared to be an interesting alternative to efficiently select interesting SNPs. Machine learning, specifically ensemble methods, might be useful as pre-screening tools for large-scale association studies because they can be less prone to overfitting, can be less computer processor time intensive, can easily include pair-wise and higher-order interactions compared with standard statistical approaches and can also have a high capability for classification. However, improved implementations that are able to deal with hundreds of thousands of SNPs at a time are required.


Assuntos
Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único , Predisposição Genética para Doença , Genoma Humano , Humanos , Análise de Regressão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA