Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 87
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Hum Mol Genet ; 32(4): 677-684, 2023 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-36164742

RESUMO

Crohn's disease (CD) and ulcerative colitis (UC), two major subtypes of inflammatory bowel disease, show substantial differences in their clinical course and treatment response. To identify the genetic factors underlying the distinct characteristics of these two diseases, we performed a genome-wide association study (GWAS) between CD (n = 2359) and UC (n = 2175) in a Korean population, followed by replication in an independent sample of 772 CD and 619 UC cases. Two novel loci were identified with divergent effects on CD and UC: rs9842650 in CD200 and rs885026 in NCOR2. In addition, the seven established susceptibility loci [major histocompatibility complex (MHC), TNFSF15, OTUD3, USP12, IL23R, FCHSD2 and RIPK2] reached genome-wide significance. Of the nine loci, six (MHC, TNFSF15, OTUD3, USP12, IL23R and CD200) were replicated in the case-case GWAS of European populations. The proportion of variance explained in CD-UC status by polygenic risk score analysis was up to 22.6%. The area under the receiver-operating characteristic curve value was 0.74, suggesting acceptable discrimination between CD and UC. This CD-UC GWAS provides new insights into genetic differences between the two diseases with similar symptoms and might be useful in improving their diagnosis and treatment.


Assuntos
Colite Ulcerativa , Doença de Crohn , Humanos , Colite Ulcerativa/genética , Doença de Crohn/genética , Estudo de Associação Genômica Ampla , Predisposição Genética para Doença , Loci Gênicos , Polimorfismo de Nucleotídeo Único/genética , Membro 15 da Superfamília de Ligantes de Fatores de Necrose Tumoral/genética , Proteínas de Transporte/genética , Proteínas de Membrana/genética , Proteases Específicas de Ubiquitina/genética
2.
Am J Hum Genet ; 109(11): 1974-1985, 2022 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-36206757

RESUMO

Almost always, the analysis of single-cell RNA-sequencing (scRNA-seq) data begins with the generation of the low dimensional embedding of the data by principal-component analysis (PCA). Because scRNA-seq data are count data, log transformation is routinely applied to correct skewness prior to PCA, which is often argued to have added bias to data. Alternatively, studies have proposed methods that directly assume a count model and use approximately normally distributed count residuals for PCA. Despite their theoretical advantage of directly modeling count data, these methods are extremely slow for large datasets. In fact, when the data size grows, even the standard log normalization becomes inefficient. Here, we present FastRNA, a highly efficient solution for PCA of scRNA-seq data based on a count model accounting for both batches and cell size factors. Although we assume the same general count model as previous methods, our method uses two orders of magnitude less time and memory than the other count-based methods and an order of magnitude less time and memory than the standard log normalization. This achievement results from our unique algebraic optimization that completely avoids the formation of the large dense residual matrix in memory. In addition, our method enjoys a benefit that the batch effects are eliminated from data prior to PCA. Generating a batch-accounted PC of an atlas-scale dataset with 2 million cells takes less than a minute and 1 GB memory with our method.


Assuntos
RNA , Análise de Célula Única , Humanos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise de Componente Principal , Sequenciamento do Exoma , Perfilação da Expressão Gênica
3.
BMC Bioinformatics ; 25(1): 24, 2024 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-38216869

RESUMO

BACKGROUND: Meta-analysis is a statistical method that combines the results of multiple studies to increase statistical power. When multiple studies participating in a meta-analysis utilize the same public dataset as controls, the summary statistics from these studies become correlated. To solve this challenge, Lin and Sullivan proposed a method to provide an optimal test statistic adjusted for the correlation. This method quickly became the standard practice. However, we identified an unexpected power asymmetry phenomenon in this standard framework. This can lead to unbalanced power for detecting protective minor alleles and risk minor alleles. RESULTS: We found that the power asymmetry of the current framework is mainly due to the errors in approximating the correlation term. We then developed a meta-analysis method based on an accurate correlation estimator, called PASTRY (A method to avoid Power ASymmeTRY). PASTRY outperformed the standard method on both simulated and real datasets in terms of the power symmetry. CONCLUSIONS: Our findings suggest that PASTRY can help to alleviate the power asymmetry problem. PASTRY is available at https://github.com/hanlab-SNU/PASTRY .


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Estudo de Associação Genômica Ampla/métodos , Alelos , Pesquisa
4.
Hum Mol Genet ; 31(22): 3934-3944, 2022 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-35512355

RESUMO

Genome-wide association studies (GWAS) of Crohn's disease (CD) in European and leprosy in Chinese population have shown that CD and leprosy share genetic risk loci. As these shared loci were identified through cross-comparisons across different ethnic populations, we hypothesized that meta-analysis of GWAS on CD and leprosy in East Asian populations would increase power to identify additional shared loci. We performed a cross-disease meta-analysis of GWAS data from CD (1621 cases and 4419 controls) and leprosy (2901 cases 3801 controls) followed by replication in additional datasets comprising 738 CD cases and 488 controls and 842 leprosy cases and 925 controls. We identified one novel locus at 7p22.3, rs77992257 in intron 2 of ADAP1, shared between CD and leprosy with genome-wide significance (P = 3.80 × 10-11) and confirmed 10 previously established loci in both diseases: IL23R, IL18RAP, IL12B, RIPK2, TNFSF15, ZNF365-EGR2, CCDC88B, LACC1, IL27, NOD2. Phenotype variance explained by the polygenic risk scores derived from Chinese leprosy data explained up to 5.28% of variance of Korean CD, supporting similar genetic structures between the two diseases. Although CD and leprosy shared a substantial number of genetic susceptibility loci in East Asians, the majority of shared susceptibility loci showed allelic effects in the opposite direction. Investigation of the genetic correlation using cross-trait linkage disequilibrium score regression also showed a negative genetic correlation between CD and leprosy (rg [SE] = -0.40[0.13], P = 2.6 × 10-3). These observations implicate the possibility that CD might be caused by hyper-sensitive reactions toward pathogenic stimuli.


Assuntos
Doença de Crohn , Hanseníase , Humanos , Estudo de Associação Genômica Ampla , Doença de Crohn/genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único/genética , Povo Asiático/genética , Loci Gênicos , Hanseníase/genética , Estudos de Casos e Controles , Membro 15 da Superfamília de Ligantes de Fatores de Necrose Tumoral/genética
5.
Hum Mol Genet ; 31(15): 2655-2667, 2022 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-35043955

RESUMO

Human leukocyte antigen (HLA) gene variants in the major histocompatibility complex (MHC) region are associated with numerous complex human diseases and quantitative traits. Previous phenome-wide association studies (PheWAS) for this region demonstrated that HLA association patterns to the phenome have both population-specific and population-shared components. We performed MHC PheWAS in the Korean population by analyzing associations between phenotypes and genetic variants in the MHC region using the Korea Biobank Array project data samples from the Korean Genome and Epidemiology Study cohorts. Using this single-population dataset, we curated and analyzed 82 phenotypes for 125 673 Korean individuals after imputing HLA using CookHLA, a recently developed imputation framework. More than one-third of these phenotypes showed significant associations, confirming 56 known associations and discovering 13 novel association signals that were not reported previously. In addition, we analyzed heritability explained by the variants in the MHC region and genetic correlations among phenotypes based on the MHC variants.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Povo Asiático/genética , Predisposição Genética para Doença , Humanos , Complexo Principal de Histocompatibilidade/genética , Fenômica , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
6.
Am J Hum Genet ; 108(1): 36-48, 2021 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-33352115

RESUMO

Identifying and interpreting pleiotropic loci is essential to understanding the shared etiology among diseases and complex traits. A common approach to mapping pleiotropic loci is to meta-analyze GWAS summary statistics across multiple traits. However, this strategy does not account for the complex genetic architectures of traits, such as genetic correlations and heritabilities. Furthermore, the interpretation is challenging because phenotypes often have different characteristics and units. We propose PLEIO (Pleiotropic Locus Exploration and Interpretation using Optimal test), a summary-statistic-based framework to map and interpret pleiotropic loci in a joint analysis of multiple diseases and complex traits. Our method maximizes power by systematically accounting for genetic correlations and heritabilities of the traits in the association test. Any set of related phenotypes, binary or quantitative traits with different units, can be combined seamlessly. In addition, our framework offers interpretation and visualization tools to help downstream analyses. Using our method, we combined 18 traits related to cardiovascular disease and identified 13 pleiotropic loci, which showed four different patterns of associations.


Assuntos
Pleiotropia Genética/genética , Estudo de Associação Genômica Ampla/métodos , Doenças Cardiovasculares/genética , Predisposição Genética para Doença/genética , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética
7.
J Autoimmun ; 145: 103206, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38554656

RESUMO

Crohn's disease (CD) is a chronic inflammatory disorder affecting the bowel wall. Tissue-resident memory T (Trm) cells are implicated in CD, yet their characteristics remain unclear. We aimed to investigate the transcriptional profiles and functional characteristics of Trm cells in the small bowel of CD and their interactions with immune cells. Seven patients with CD and four with ulcerative colitis as controls were included. Single-cell RNA sequencing and paired T cell receptor sequencing assessed T cell subsets and transcriptional signatures in lamina propria (LP) and submucosa/muscularis propria-enriched fractions (SM/MP) from small bowel tissue samples. We detected 58,123 T cells grouped into 16 populations, including the CD4+ Trm cells with a Th17 signature and CD8+ Trm clusters. In CD, CD4+ Trm cells with a Th17 signature, termed Th17 Trm, showed significantly increased proportions within both the LP and SM/MP areas. The Th17 Trm cluster demonstrated heightened expression of tissue-residency marker genes (ITGAE, ITGA1, and CXCR6) along with elevated levels of IL17A, IL22, CCR6, and CCL20. The clonal expansion of Th17 Trm cells in CD was accompanied by enhanced transmural dynamic potential, as indicated by significantly higher migration scores. CD-prominent Th17 Trm cells displayed an increased interferon gamma (IFNγ)-related signature possibly linked with STAT1 activation, inducing chemokines (i.e., CXCL10, CXCL8, and CXCL9) in myeloid cells. Our findings underscored the elevated Th17 Trm cells throughout the small bowel in CD, contributing to disease pathogenesis through IFNγ induction and subsequent chemokine production in myeloid cells.


Assuntos
Doença de Crohn , Memória Imunológica , Células T de Memória , Células Th17 , Humanos , Doença de Crohn/imunologia , Doença de Crohn/genética , Doença de Crohn/patologia , Células Th17/imunologia , Células Th17/metabolismo , Células T de Memória/imunologia , Células T de Memória/metabolismo , Masculino , Feminino , Adulto , Pessoa de Meia-Idade , Mucosa Intestinal/imunologia , Mucosa Intestinal/metabolismo , Mucosa Intestinal/patologia , Subpopulações de Linfócitos T/imunologia , Subpopulações de Linfócitos T/metabolismo , Biomarcadores , Perfilação da Expressão Gênica , Adulto Jovem
8.
Nucleic Acids Res ; 50(12): e71, 2022 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-35420135

RESUMO

The standard analysis pipeline for single-cell RNA-seq data consists of sequential steps initiated by clustering the cells. An innate limitation of this pipeline is that an imperfect clustering result can irreversibly affect the succeeding steps. For example, there can be cell types not well distinguished by clustering because they largely share the global structure, such as the anterior primitive streak and mid primitive streak cells. If one searches differentially expressed genes (DEGs) solely based on clustering, marker genes for distinguishing these types will be missed. Moreover, clustering depends on many parameters and can often be subjective to manual decisions. To overcome these limitations, we propose MarcoPolo, a method that identifies informative DEGs independently of prior clustering. MarcoPolo sorts out genes by evaluating if the distributions are bimodal, if similar expression patterns are observed in other genes, and if the expressing cells are proximal in a low-dimensional space. Using real datasets with FACS-purified cell labels, we demonstrate that MarcoPolo recovers marker genes better than competing methods. Notably, MarcoPolo finds key genes that can distinguish cell types that are not distinguishable by the standard clustering. MarcoPolo is built in a convenient software package that provides analysis results in an HTML file.


Assuntos
Análise de Célula Única , Software , Algoritmos , Biomarcadores , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Sequenciamento do Exoma
9.
PLoS Genet ; 17(6): e1009596, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34061836

RESUMO

The rapid decrease in sequencing cost has enabled genetic studies to discover rare variants associated with complex diseases and traits. Once this association is identified, the next step is to understand the genetic mechanism of rare variants on how the variants influence diseases. Similar to the hypothesis of common variants, rare variants may affect diseases by regulating gene expression, and recently, several studies have identified the effects of rare variants on gene expression using heritability and expression outlier analyses. However, identifying individual genes whose expression is regulated by rare variants has been challenging due to the relatively small sample size of expression quantitative trait loci studies and statistical approaches not optimized to detect the effects of rare variants. In this study, we analyze whole-genome sequencing and RNA-seq data of 681 European individuals collected for the Genotype-Tissue Expression (GTEx) project (v8) to identify individual genes in 49 human tissues whose expression is regulated by rare variants. To improve statistical power, we develop an approach based on a likelihood ratio test that combines effects of multiple rare variants in a nonlinear manner and has higher power than previous approaches. Using GTEx data, we identify many genes regulated by rare variants, and some of them are only regulated by rare variants and not by common variants. We also find that genes regulated by rare variants are enriched for expression outliers and disease-causing genes. These results suggest the regulatory effects of rare variants, which would be important in interpreting associations of rare variants with complex traits.


Assuntos
Regulação da Expressão Gênica , Locos de Características Quantitativas , Humanos , Herança Multifatorial
10.
Bioinformatics ; 37(3): 416-418, 2021 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-32735319

RESUMO

SUMMARY: Fine-mapping human leukocyte antigen (HLA) genes involved in disease susceptibility to individual alleles or amino acid residues has been challenging. Using information regarding HLA alleles obtained from HLA typing, HLA imputation or HLA inference, our software expands the alleles to amino acid sequences using the most recent IMGT/HLA database and prepares a dataset suitable for fine-mapping analysis. Our software also provides useful functionalities, such as various association tests, visualization tools and nomenclature conversion. AVAILABILITY AND IMPLEMENTATION: https://github.com/WansonChoi/HATK.


Assuntos
Antígenos HLA , Software , Alelos , Sequência de Aminoácidos , Mapeamento Cromossômico , Predisposição Genética para Doença , Antígenos HLA/genética , Teste de Histocompatibilidade , Humanos
11.
Am J Physiol Lung Cell Mol Physiol ; 321(1): L130-L143, 2021 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-33909500

RESUMO

Genome-wide association studies (GWASs) have identified regions associated with chronic obstructive pulmonary disease (COPD). GWASs of other diseases have shown an approximately 10-fold overrepresentation of nonsynonymous variants, despite limited exonic coverage on genotyping arrays. We hypothesized that a large-scale analysis of coding variants could discover novel genetic associations with COPD, including rare variants with large effect sizes. We performed a meta-analysis of exome arrays from 218,399 controls and 33,851 moderate-to-severe COPD cases. All exome-wide significant associations were present in regions previously identified by GWAS. We did not identify any novel rare coding variants with large effect sizes. Within GWAS regions on chromosomes 5q, 6p, and 15q, four coding variants were conditionally significant (P < 0.00015) when adjusting for lead GWAS single-nucleotide polymorphisms A common gasdermin B (GSDMB) splice variant (rs11078928) previously associated with a decreased risk for asthma was nominally associated with a decreased risk for COPD [minor allele frequency (MAF) = 0.46, P = 1.8e-4]. Two stop variants in coiled-coil α-helical rod protein 1 (CCHCR1), a gene involved in regulating cell proliferation, were associated with COPD (both P < 0.0001). The SERPINA1 Z allele was associated with a random-effects odds ratio of 1.43 for COPD (95% confidence interval = 1.17-1.74), though with marked heterogeneity across studies. Overall, COPD-associated exonic variants were identified in genes involved in DNA methylation, cell-matrix interactions, cell proliferation, and cell death. In conclusion, we performed the largest exome array meta-analysis of COPD to date and identified potential functional coding variants. Future studies are needed to identify rarer variants and further define the role of coding variants in COPD pathogenesis.


Assuntos
Exoma/genética , Marcadores Genéticos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Doença Pulmonar Obstrutiva Crônica/genética , Doença Pulmonar Obstrutiva Crônica/patologia , Regulação da Expressão Gênica , Humanos , Metanálise como Assunto
12.
Hum Mol Genet ; 28(20): 3498-3513, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-31211845

RESUMO

Many immune diseases occur at different rates among people with schizophrenia compared to the general population. Here, we evaluated whether this phenomenon might be explained by shared genetic risk factors. We used data from large genome-wide association studies to compare the genetic architecture of schizophrenia to 19 immune diseases. First, we evaluated the association with schizophrenia of 581 variants previously reported to be associated with immune diseases at genome-wide significance. We identified five variants with potentially pleiotropic effects. While colocalization analyses were inconclusive, functional characterization of these variants provided the strongest evidence for a model in which genetic variation at rs1734907 modulates risk of schizophrenia and Crohn's disease via altered methylation and expression of EPHB4-a gene whose protein product guides the migration of neuronal axons in the brain and the migration of lymphocytes towards infected cells in the immune system. Next, we investigated genome-wide sharing of common variants between schizophrenia and immune diseases using cross-trait LD score regression. Of the 11 immune diseases with available genome-wide summary statistics, we observed genetic correlation between six immune diseases and schizophrenia: inflammatory bowel disease (rg = 0.12 ± 0.03, P = 2.49 × 10-4), Crohn's disease (rg = 0.097 ± 0.06, P = 3.27 × 10-3), ulcerative colitis (rg = 0.11 ± 0.04, P = 4.05 × 10-3), primary biliary cirrhosis (rg = 0.13 ± 0.05, P = 3.98 × 10-3), psoriasis (rg = 0.18 ± 0.07, P = 7.78 × 10-3) and systemic lupus erythematosus (rg = 0.13 ± 0.05, P = 3.76 × 10-3). With the exception of ulcerative colitis, the degree and direction of these genetic correlations were consistent with the expected phenotypic correlation based on epidemiological data. Our findings suggest shared genetic risk factors contribute to the epidemiological association of certain immune diseases and schizophrenia.


Assuntos
Predisposição Genética para Doença/genética , Doenças do Sistema Imunitário/etiologia , Doenças do Sistema Imunitário/genética , Esquizofrenia/etiologia , Esquizofrenia/genética , Estudo de Associação Genômica Ampla , Humanos , Doenças do Sistema Imunitário/epidemiologia , Polimorfismo de Nucleotídeo Único/genética , Esquizofrenia/epidemiologia
13.
Hum Mol Genet ; 27(22): 3901-3910, 2018 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-30084967

RESUMO

Crohn's disease (CD) and ulcerative colitis (UC) are the major types of chronic inflammatory bowel disease (IBD) characterized by recurring episodes of inflammation of the gastrointestinal tract. Although it is well established that human leukocyte antigen (HLA) is a major risk factor for IBD, it is yet to be determined which HLA alleles or amino acids drive the risks of CD and UC in Asians. To define the roles of HLA for IBD in Asians, we fine-mapped HLA in 12 568 individuals from Korea and Japan (3294 patients with CD, 1522 patients with UC and 7752 controls). We identified that the amino acid position 37 of HLA-DRß1 plays a key role in the susceptibility to CD (presence of serine being protective, P = 3.6 × 10-67, OR = 0.48 [0.45-0.52]). For UC, we confirmed the known association of the haplotype spanning HLA-C*12:02, HLA-B*52:01 and HLA-DRB1*1502 (P = 1.2 × 10-28, OR = 4.01 [3.14-5.12]).


Assuntos
Colite Ulcerativa/genética , Doença de Crohn/genética , Predisposição Genética para Doença , Cadeias HLA-DRB1/genética , Doenças Inflamatórias Intestinais/genética , Alelos , Substituição de Aminoácidos/genética , Aminoácidos/química , Aminoácidos/genética , Povo Asiático/genética , Colite Ulcerativa/patologia , Doença de Crohn/patologia , Feminino , Estudos de Associação Genética , Genótipo , Cadeias HLA-DRB1/química , Haplótipos/genética , Humanos , Doenças Inflamatórias Intestinais/patologia , Japão , Masculino , Conformação Proteica , República da Coreia
14.
J Proteome Res ; 18(8): 3195-3202, 2019 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-31314536

RESUMO

Deep learning (DL), a type of machine learning approach, is a powerful tool for analyzing large sets of data that are derived from biomedical sciences. However, it remains unknown whether DL is suitable for identifying contributing factors, such as biomarkers, in quantitative proteomics data. In this study, we describe an optimized DL-based analytical approach using a data set that was generated by selected reaction monitoring-mass spectrometry (SRM-MS), comprising SRM-MS data from 1008 samples for the diagnosis of pancreatic cancer, to test its classification power. Its performance was compared with that of 5 conventional multivariate and machine learning methods: random forest (RF), support vector machine (SVM), logistic regression (LR), k-nearest neighbors (k-NN), and naïve Bayes (NB). The DL method yielded the best classification (AUC 0.9472 for the test data set) of all approaches. We also optimized the parameters of DL individually to determine which factors were the most significant. In summary, the DL method has advantages in classifying the quantitative proteomics data of pancreatic cancer patients, and our results suggest that its implementation can improve the performance of diagnostic assays in clinical settings.


Assuntos
Aprendizado Profundo/estatística & dados numéricos , Aprendizado de Máquina/estatística & dados numéricos , Espectrometria de Massas/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Algoritmos , Teorema de Bayes , Análise por Conglomerados , Humanos , Modelos Logísticos , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas/patologia , Máquina de Vetores de Suporte/estatística & dados numéricos
15.
Am J Hum Genet ; 99(1): 89-103, 2016 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-27292110

RESUMO

Genome-wide association studies (GWASs) have been successful in detecting variants correlated with phenotypes of clinical interest. However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and for phenotypes that are difficult to collect, the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes or related phenotypes are easier to collect and have already been collected in very large samples. This paper demonstrates how we take advantage of these additional related phenotypes to impute the phenotype of interest or target phenotype and then perform association analysis. Our approach leverages the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically given the correlation structure of the phenotypes used in imputation. In addition, our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes. Thus, our method is applicable to datasets for which we have access only to summary statistics and not to the raw genotypes. We illustrate our approach by analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Fenótipo , Animais , Pressão Sanguínea/genética , Índice de Massa Corporal , Estudos de Coortes , Conjuntos de Dados como Assunto , Finlândia , Genótipo , Humanos , Camundongos , Modelos Genéticos , Herança Multifatorial , Reprodutibilidade dos Testes , Projetos de Pesquisa , Tamanho da Amostra , Triglicerídeos/sangue
16.
J Gastroenterol Hepatol ; 34(10): 1777-1783, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31038770

RESUMO

BACKGROUND AND AIM: Tobacco smoking is a risk factor for gastrointestinal disorders, causing mucosal damage and impairing immune responses. However, smoking has been found to be protective against ulcerative colitis (UC). Human leukocyte antigen (HLA) is a major susceptibility locus for UC, and HLA-DRB1*15:02 has the strongest effect in Asians. This study investigated the effects of smoking on the association between HLA and UC. METHODS: The study enrolled 882 patients with UC, including 526 never, 151 current, and 205 former smokers, and 3091 healthy controls, including 2124 never, 502 current, and 465 former smokers. Smoking-stratified analyses of HLA data were performed using a case-control approach. RESULTS: In a case-control approach, HLA-DRB1*15:02 was associated with UC in never smokers (ORnever smokers  = 3.20, Pnever smokers  = 7.88 × 10-23 ) but not in current or former smokers (Pcurrent smokers  = 0.72 and Pformer smokers  = 0.33, respectively). In current smokers, HLA-DQB1*06 was associated with UC (ORcurrent smokers  = 2.59, Pcurrent smokers  = 6.39 × 10-12 ). No variants reached genome-wide significance in former smokers. CONCLUSIONS: An association between UC and HLA-DRB1*15:02 was limited to never smokers. Our findings highlight that tobacco smoking modifies the effects of HLA on the risk of UC.


Assuntos
Colite Ulcerativa/genética , Interação Gene-Ambiente , Cadeias HLA-DRB1/genética , não Fumantes , Fumantes , Fumar/genética , Adulto , Idoso , Estudos de Casos e Controles , Colite Ulcerativa/diagnóstico , Colite Ulcerativa/imunologia , Feminino , Cadeias HLA-DRB1/imunologia , Humanos , Masculino , Pessoa de Meia-Idade , Medição de Risco , Fatores de Risco , Fumar/efeitos adversos , Fumar/imunologia , Abandono do Hábito de Fumar
17.
Hum Mol Genet ; 25(9): 1857-66, 2016 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-26908615

RESUMO

Meta-analysis strategies have become critical to augment power of genome-wide association studies (GWAS). To reduce genotyping or sequencing cost, many studies today utilize shared controls, and these individuals can inadvertently overlap among multiple studies. If these overlapping individuals are not taken into account in meta-analysis, they can induce spurious associations. In this article, we propose a general framework for adjusting association statistics to account for overlapping subjects within a meta-analysis. The key idea of our method is to transform the covariance structure of the data, so it can be used in downstream analyses. As a result, the strategy is very flexible and allows a wide range of meta-analysis methods, such as the random effects model, to account for overlapping subjects. Using simulations and real datasets, we demonstrate that our method has utility in meta-analyses of GWAS, as well as in a multi-tissue mouse expression quantitative trait loci (eQTL) study where our method increases the number of discovered eQTL by up to 19% compared with existing methods.


Assuntos
Doença/genética , Estudo de Associação Genômica Ampla/métodos , Metanálise como Assunto , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Animais , Estudos de Casos e Controles , Perfilação da Expressão Gênica , Humanos , Camundongos , Modelos Teóricos
18.
Am J Hum Genet ; 97(1): 139-52, 2015 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-26140449

RESUMO

Identifying genomic annotations that differentiate causal from trait-associated variants is essential to fine mapping disease loci. Although many studies have identified non-coding functional annotations that overlap disease-associated variants, these annotations often colocalize, complicating the ability to use these annotations for fine mapping causal variation. We developed a statistical approach (Genomic Annotation Shifter [GoShifter]) to assess whether enriched annotations are able to prioritize causal variation. GoShifter defines the null distribution of an annotation overlapping an allele by locally shifting annotations; this approach is less sensitive to biases arising from local genomic structure than commonly used enrichment methods that depend on SNP matching. Local shifting also allows GoShifter to identify independent causal effects from colocalizing annotations. Using GoShifter, we confirmed that variants in expression quantitative trail loci drive gene-expression changes though DNase-I hypersensitive sites (DHSs) near transcription start sites and independently through 3' UTR regulation. We also showed that (1) 15%-36% of trait-associated loci map to DHSs independently of other annotations; (2) loci associated with breast cancer and rheumatoid arthritis harbor potentially causal variants near the summits of histone marks rather than full peak bodies; (3) variants associated with height are highly enriched in embryonic stem cell DHSs; and (4) we can effectively prioritize causal variation at specific loci.


Assuntos
Regulação da Expressão Gênica/genética , Variação Genética , Genoma Humano/genética , Anotação de Sequência Molecular/métodos , Locos de Características Quantitativas/genética , Artrite Reumatoide/genética , Neoplasias da Mama/genética , Histonas/genética , Histonas/metabolismo , Humanos
19.
Am J Hum Genet ; 96(6): 857-68, 2015 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-26027500

RESUMO

In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum p value among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset.


Assuntos
Interpretação Estatística de Dados , Regulação da Expressão Gênica/genética , Genes/genética , Variação Genética , Locos de Características Quantitativas/genética , Humanos , Análise Multivariada , Distribuição Normal , Polimorfismo de Nucleotídeo Único/genética , Probabilidade , Tamanho da Amostra , Estatísticas não Paramétricas
20.
Bioinformatics ; 33(24): 3947-3954, 2017 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-29036405

RESUMO

MOTIVATION: In genetic association studies, meta-analyses are widely used to increase the statistical power by aggregating information from multiple studies. In meta-analyses, participating studies often share the same individuals due to the shared use of publicly available control data or accidental recruiting of the same subjects. As such overlapping can inflate false positive rate, overlapping subjects are traditionally split in the studies prior to meta-analysis, which requires access to genotype data and is not always possible. Fortunately, recently developed meta-analysis methods can systematically account for overlapping subjects at the summary statistics level. RESULTS: We identify and report a phenomenon that these methods for overlapping subjects can yield low power. For instance, in our simulation involving a meta-analysis of five studies that share 20% of individuals, whereas the traditional splitting method achieved 80% power, none of the new methods exceeded 32% power. We found that this low power resulted from the unaccounted differences between shared and unshared individuals in terms of their contributions towards the final statistic. Here, we propose an optimal summary-statistic-based method termed as FOLD that increases the power of meta-analysis involving studies with overlapping subjects. AVAILABILITY AND IMPLEMENTATION: Our method is available at http://software.buhmhan.com/FOLD. CONTACT: mail: buhm.han@amc.seoul.kr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Metanálise como Assunto , Genótipo , Humanos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA