Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Genomics ; 15: 81, 2014 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-24476119

RESUMO

BACKGROUND: Genetic variation associated with human leukocyte antigen (HLA) genes has immunological functions and is associated with autoimmune diseases. To date, large-scale studies involving classical HLA genes have been limited by time-consuming and expensive HLA-typing technologies. To reduce these costs, single-nucleotide polymorphisms (SNPs) have been used to predict HLA-allele types. Although HLA allelic distributions differ among populations, most prediction model of HLA genes are based on Caucasian samples, with few reported studies involving non-Caucasians. RESULTS: Our sample consisted of 437 Han Chinese with Affymetrix 5.0 and Illumina 550 K SNPs, of whom 214 also had data on Affymetrix 6.0 SNPs. All individuals had HLA typings at a 4-digit resolution. Using these data, we have built prediction model of HLA genes that are specific for a Han Chinese population. To optimize our prediction model of HLA genes, we analyzed a number of critical parameters, including flanking-region size, genotyping platform, and imputation. Predictive accuracies generally increased both with sample size and SNP density. CONCLUSIONS: SNP data from the HapMap Project are about five times more dense than commercially available genotype chip data. Using chips to genotype our samples, however, only reduced the accuracy of our HLA predictions by only ~3%, while saving a great deal of time and expense. We demonstrated that classical HLA alleles can be predicted from SNP genotype data with a high level of accuracy (80.37% (HLA-B) ~95.79% (HLA-DQB1)) in a Han Chinese population. This finding offers new opportunities for researchers in obtaining HLA genotypes via prediction using their already existing chip datasets. Since the genetic variation structure (e.g. SNP, HLA, Linkage disequilibrium) is different between Han Chinese and Caucasians, and has strong impact in building prediction models for HLA genes, our findings emphasize the importance of building ethnic-specific models when analyzing human populations.


Assuntos
Povo Asiático/genética , Antígenos HLA/genética , Polimorfismo de Nucleotídeo Único , Região 3'-Flanqueadora , Região 5'-Flanqueadora , Alelos , China , Frequência do Gene , Genótipo , Antígenos HLA-B/genética , Cadeias beta de HLA-DQ/genética , Projeto HapMap , Humanos , Desequilíbrio de Ligação
2.
J Biomed Sci ; 21: 88, 2014 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-25175702

RESUMO

BACKGROUND: Genome-wide association studies have been successful in identifying common genetic variants for human diseases. However, much of the heritable variation associated with diseases such as Parkinson's disease remains unknown suggesting that many more risk loci are yet to be identified. Rare variants have become important in disease association studies for explaining missing heritability. Methods for detecting this type of association require prior knowledge on candidate genes and combining variants within the region. These methods may suffer from power loss in situations with many neutral variants or causal variants with opposite effects. RESULTS: We propose a method capable of scanning genetic variants to identify the region most likely harbouring disease gene with rare and/or common causal variants. Our method assigns a score at each individual variant based on our scoring system. It uses aggregate scores to identify the region with disease association. We evaluate performance by simulation based on 1000 Genomes sequencing data and compare with three commonly used methods. We use a Parkinson's disease case-control dataset as a model to demonstrate the application of our method. Our method has better power than CMC and WSS and similar power to SKAT-O with well-controlled type I error under simulation based on 1000 Genomes sequencing data. In real data analysis, we confirm the association of α-synuclein gene (SNCA) with Parkinson's disease (p = 0.005). We further identify association with hyaluronan synthase 2 (HAS2, p = 0.028) and kringle containing transmembrane protein 1 (KREMEN1, p = 0.006). KREMEN1 is associated with Wnt signalling pathway which has been shown to play an important role for neurodegeneration in Parkinson's disease. CONCLUSIONS: Our method is time efficient and less sensitive to inclusion of neutral variants and direction effect of causal variants. It can narrow down a genomic region or a chromosome to a disease associated region. Using Parkinson's disease as a model, our method not only confirms association for a known gene but also identifies two genes previously found by other studies. In spite of many existing methods, we conclude that our method serves as an efficient alternative for exploring genomic data containing both rare and common variants.


Assuntos
Variação Genética , Estudo de Associação Genômica Ampla/métodos , Doença de Parkinson/genética , Estudos de Casos e Controles , Humanos , Modelos Teóricos
3.
Genet Epidemiol ; 36(6): 594-601, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22807216

RESUMO

Genome-wide association studies (GWAS) have become the method of choice for identifying disease susceptibility genes in common disease genetics research. Despite successes in these studies, much of the heritability remains unexplained due to lack of power and low resolution. High-density genotyping arrays can now screen more than 5 million genetic markers. As a result, multiple comparison has become an important issue especially in the era of next-generation sequencing. We propose to use a two-stage maximal segmental score procedure (MSS) which uses region-specific empirical P-values to identify genomic segments most likely harboring the disease gene. We develop scoring systems based on Fisher's P-value combining method to convert locus-specific significance levels into region-specific scores. Through simulations, our result indicated that MSS increased the power to detect genetic association as compared with conventional methods provided type I error was at 5%. We demonstrated the application of MSS on a publicly available case-control dataset of Parkinson's disease and replicated the findings in the literature. MSS provides an efficient exploratory tool for high-density association data in the current era of next-generation sequencing. R source codes to implement the MSS procedure are freely available at http://www.csjfann.ibms.sinica.edu.tw/EAG/program/programlist.htm.


Assuntos
Marcadores Genéticos , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Estudos de Casos e Controles , Simulação por Computador , Frequência do Gene , Predisposição Genética para Doença , Haplótipos/genética , Humanos , Doença de Parkinson/genética , Software
4.
Genomics ; 97(2): 77-85, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21111805

RESUMO

Haplotype-based approaches may have greater power than single-locus analyses when the SNPs are in strong linkage disequilibrium with the risk locus. To overcome potential complexities owing to large numbers of haplotypes in genetic studies, we evaluated two data mining approaches, multifactor dimensionality reduction (MDR) and classification and regression tree (CART), with the concept of haplotypes considering their haplotype uncertainty to detect haplotype-haplotype (HH) interactions. In evaluation of performance for detecting HH interactions, MDR had higher power than CART, but MDR gave a slightly higher type I error. Additionally, we performed an HH interaction analysis with a publicly available dataset of Parkinson's disease and confirmed previous findings that the RET proto-oncogene is associated with the disease. In this study, we showed that using HH interaction analysis is possible to assist researchers in gaining more insight into identifying genetic risk factors for complex diseases.


Assuntos
Haplótipos/genética , Redução Dimensional com Múltiplos Fatores/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Simulação por Computador , Mineração de Dados , Humanos , Desequilíbrio de Ligação , Doença de Parkinson/genética , Proto-Oncogene Mas , Proteínas Proto-Oncogênicas c-ret/genética , Análise de Regressão
5.
BMC Bioinformatics ; 11: 111, 2010 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-20187971

RESUMO

BACKGROUND: Combining data from different ethnic populations in a study can increase efficacy of methods designed to identify expression quantitative trait loci (eQTL) compared to analyzing each population independently. In such studies, however, the genetic diversity of minor allele frequencies among populations has rarely been taken into account. Due to the fact that allele frequency diversity and population-level expression differences are present in populations, a consensus regarding the optimal statistical approach for analysis of eQTL in data combining different populations remains inconclusive. RESULTS: In this report, we explored the applicability of a constrained two-way model to identify eQTL for combined ethnic data that might contain genetic diversity among ethnic populations. In addition, gene expression differences resulted from ethnic allele frequency diversity between populations were directly estimated and analyzed by the constrained two-way model. Through simulation, we investigated effects of genetic diversity on eQTL identification by examining gene expression data pooled from normal quantile transformation of each population. Using the constrained two-way model to reanalyze data from Caucasians and Asian individuals available from HapMap, a large number of eQTL were identified with similar genetic effects on the gene expression levels in these two populations. Furthermore, 19 single nucleotide polymorphisms with inter-population differences with respect to both genotype frequency and gene expression levels directed by genotypes were identified and reflected a clear distinction between Caucasians and Asian individuals. CONCLUSIONS: This study illustrates the influence of minor allele frequencies on common eQTL identification using either separate or combined population data. Our findings are important for future eQTL studies in which different datasets are combined to increase the power of eQTL identification.


Assuntos
Genética Populacional , Genômica/métodos , Modelos Estatísticos , Locos de Características Quantitativas , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Humanos
6.
Gene ; 533(1): 304-12, 2014 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-24076437

RESUMO

Identifying susceptibility genes that influence complex diseases is extremely difficult because loci often influence the disease state through genetic interactions. Numerous approaches to detect disease-associated SNP-SNP interactions have been developed, but none consistently generates high-quality results under different disease scenarios. Using summarizing techniques to combine a number of existing methods may provide a solution to this problem. Here we used three popular non-parametric methods-Gini, absolute probability difference (APD), and entropy-to develop two novel summary scores, namely principle component score (PCS) and Z-sum score (ZSS), with which to predict disease-associated genetic interactions. We used a simulation study to compare performance of the non-parametric scores, the summary scores, the scaled-sum score (SSS; used in polymorphism interaction analysis (PIA)), and the multifactor dimensionality reduction (MDR). The non-parametric methods achieved high power, but no non-parametric method outperformed all others under a variety of epistatic scenarios. PCS and ZSS, however, outperformed MDR. PCS, ZSS and SSS displayed controlled type-I-errors (<0.05) compared to GS, APDS, ES (>0.05). A real data study using the genetic-analysis-workshop 16 (GAW 16) rheumatoid arthritis dataset identified a number of interesting SNP-SNP interactions.


Assuntos
Polimorfismo de Nucleotídeo Único , Humanos , Análise de Componente Principal , Probabilidade
7.
PLoS One ; 9(5): e97513, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24818602

RESUMO

Advances in biotechnology have resulted in large-scale studies of DNA methylation. A differentially methylated region (DMR) is a genomic region with multiple adjacent CpG sites that exhibit different methylation statuses among multiple samples. Many so-called "supervised" methods have been established to identify DMRs between two or more comparison groups. Methods for the identification of DMRs without reference to phenotypic information are, however, less well studied. An alternative "unsupervised" approach was proposed, in which DMRs in studied samples were identified with consideration of nature dependence structure of methylation measurements between neighboring probes from tiling arrays. Through simulation study, we investigated effects of dependencies between neighboring probes on determining DMRs where a lot of spurious signals would be produced if the methylation data were analyzed independently of the probe. In contrast, our newly proposed method could successfully correct for this effect with a well-controlled false positive rate and a comparable sensitivity. By applying to two real datasets, we demonstrated that our method could provide a global picture of methylation variation in studied samples. R source codes to implement the proposed method were freely available at http://www.csjfann.ibms.sinica.edu.tw/eag/programlist/ICDMR/ICDMR.html.


Assuntos
Biologia Computacional/métodos , Metilação de DNA , Astrocitoma/genética , Análise por Conglomerados , Ilhas de CpG/genética , Humanos , Modelos Genéticos
8.
PLoS One ; 7(7): e40996, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22815890

RESUMO

Complex diseases are typically caused by combinations of molecular disturbances that vary widely among different patients. Endophenotypes, a combination of genetic factors associated with a disease, offer a simplified approach to dissect complex trait by reducing genetic heterogeneity. Because molecular dissimilarities often exist between patients with indistinguishable disease symptoms, these unique molecular features may reflect pathogenic heterogeneity. To detect molecular dissimilarities among patients and reduce the complexity of high-dimension data, we have explored an endophenotype-identification analytical procedure that combines non-negative matrix factorization (NMF) and adjusted rand index (ARI), a measure of the similarity of two clusterings of a data set. To evaluate this procedure, we compared it with a commonly used method, principal component analysis with k-means clustering (PCA-K). A simulation study with gene expression dataset and genotype information was conducted to examine the performance of our procedure and PCA-K. The results showed that NMF mostly outperformed PCA-K. Additionally, we applied our endophenotype-identification analytical procedure to a publicly available dataset containing data derived from patients with late-onset Alzheimer's disease (LOAD). NMF distilled information associated with 1,116 transcripts into three metagenes and three molecular subtypes (MS) for patients in the LOAD dataset: MS1 (n1=80), MS2 (n2=73), and MS3 (n3=23). ARI was then used to determine the most representative transcripts for each metagene; 123, 89, and 71 metagene-specific transcripts were identified for MS1, MS2, and MS3, respectively. These metagene-specific transcripts were identified as the endophenotypes. Our results showed that 14, 38, 0, and 28 candidate susceptibility genes listed in AlzGene database were found by all patients, MS1, MS2, and MS3, respectively. Moreover, we found that MS2 might be a normal-like subtype. Our proposed procedure provides an alternative approach to investigate the pathogenic mechanism of disease and better understand the relationship between phenotype and genotype.


Assuntos
Endofenótipos , Doenças Genéticas Inatas/genética , Modelos Genéticos , Algoritmos , Doença de Alzheimer/genética , Análise por Conglomerados , Simulação por Computador , Perfilação da Expressão Gênica , Genótipo , Humanos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo de Nucleotídeo Único , RNA Mensageiro/metabolismo , Transcrição Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA