Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 88
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Hum Mol Genet ; 32(4): 677-684, 2023 01 27.
Artículo en Inglés | MEDLINE | ID: mdl-36164742

RESUMEN

Crohn's disease (CD) and ulcerative colitis (UC), two major subtypes of inflammatory bowel disease, show substantial differences in their clinical course and treatment response. To identify the genetic factors underlying the distinct characteristics of these two diseases, we performed a genome-wide association study (GWAS) between CD (n = 2359) and UC (n = 2175) in a Korean population, followed by replication in an independent sample of 772 CD and 619 UC cases. Two novel loci were identified with divergent effects on CD and UC: rs9842650 in CD200 and rs885026 in NCOR2. In addition, the seven established susceptibility loci [major histocompatibility complex (MHC), TNFSF15, OTUD3, USP12, IL23R, FCHSD2 and RIPK2] reached genome-wide significance. Of the nine loci, six (MHC, TNFSF15, OTUD3, USP12, IL23R and CD200) were replicated in the case-case GWAS of European populations. The proportion of variance explained in CD-UC status by polygenic risk score analysis was up to 22.6%. The area under the receiver-operating characteristic curve value was 0.74, suggesting acceptable discrimination between CD and UC. This CD-UC GWAS provides new insights into genetic differences between the two diseases with similar symptoms and might be useful in improving their diagnosis and treatment.


Asunto(s)
Colitis Ulcerosa , Enfermedad de Crohn , Humanos , Colitis Ulcerosa/genética , Enfermedad de Crohn/genética , Estudio de Asociación del Genoma Completo , Predisposición Genética a la Enfermedad , Sitios Genéticos , Polimorfismo de Nucleótido Simple/genética , Miembro 15 de la Superfamilia de Ligandos de Factores de Necrosis Tumoral/genética , Proteínas Portadoras/genética , Proteínas de la Membrana/genética , Proteasas Ubiquitina-Específicas/genética
2.
Am J Hum Genet ; 109(11): 1974-1985, 2022 11 03.
Artículo en Inglés | MEDLINE | ID: mdl-36206757

RESUMEN

Almost always, the analysis of single-cell RNA-sequencing (scRNA-seq) data begins with the generation of the low dimensional embedding of the data by principal-component analysis (PCA). Because scRNA-seq data are count data, log transformation is routinely applied to correct skewness prior to PCA, which is often argued to have added bias to data. Alternatively, studies have proposed methods that directly assume a count model and use approximately normally distributed count residuals for PCA. Despite their theoretical advantage of directly modeling count data, these methods are extremely slow for large datasets. In fact, when the data size grows, even the standard log normalization becomes inefficient. Here, we present FastRNA, a highly efficient solution for PCA of scRNA-seq data based on a count model accounting for both batches and cell size factors. Although we assume the same general count model as previous methods, our method uses two orders of magnitude less time and memory than the other count-based methods and an order of magnitude less time and memory than the standard log normalization. This achievement results from our unique algebraic optimization that completely avoids the formation of the large dense residual matrix in memory. In addition, our method enjoys a benefit that the batch effects are eliminated from data prior to PCA. Generating a batch-accounted PC of an atlas-scale dataset with 2 million cells takes less than a minute and 1 GB memory with our method.


Asunto(s)
ARN , Análisis de la Célula Individual , Humanos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Análisis de Componente Principal , Secuenciación del Exoma , Perfilación de la Expresión Génica
3.
Bioinformatics ; 40(8)2024 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-39115884

RESUMEN

MOTIVATION: Generalized linear mixed models (GLMMs), such as the negative-binomial or Poisson linear mixed model, are widely applied to single-cell RNA sequencing data to compare transcript expression between different conditions determined at the subject level. However, the model is computationally intensive, and its relative statistical performance to pseudobulk approaches is poorly understood. RESULTS: We propose offset-pseudobulk as a lightweight alternative to GLMMs. We prove that a count-based pseudobulk equipped with a proper offset variable has the same statistical properties as GLMMs in terms of both point estimates and standard errors. We confirm our findings using simulations based on real data. Offset-pseudobulk is substantially faster (>×10) and numerically more stable than GLMMs. AVAILABILITY AND IMPLEMENTATION: Offset pseudobulk can be easily implemented in any generalized linear model software by tweaking a few options. The codes can be found at https://github.com/hanbin973/pseudobulk_is_mm.


Asunto(s)
Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Modelos Lineales , Programas Informáticos , Estudios de Casos y Controles , Análisis de Secuencia de ARN/métodos , Humanos , Algoritmos
4.
BMC Bioinformatics ; 25(1): 24, 2024 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-38216869

RESUMEN

BACKGROUND: Meta-analysis is a statistical method that combines the results of multiple studies to increase statistical power. When multiple studies participating in a meta-analysis utilize the same public dataset as controls, the summary statistics from these studies become correlated. To solve this challenge, Lin and Sullivan proposed a method to provide an optimal test statistic adjusted for the correlation. This method quickly became the standard practice. However, we identified an unexpected power asymmetry phenomenon in this standard framework. This can lead to unbalanced power for detecting protective minor alleles and risk minor alleles. RESULTS: We found that the power asymmetry of the current framework is mainly due to the errors in approximating the correlation term. We then developed a meta-analysis method based on an accurate correlation estimator, called PASTRY (A method to avoid Power ASymmeTRY). PASTRY outperformed the standard method on both simulated and real datasets in terms of the power symmetry. CONCLUSIONS: Our findings suggest that PASTRY can help to alleviate the power asymmetry problem. PASTRY is available at https://github.com/hanlab-SNU/PASTRY .


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Estudio de Asociación del Genoma Completo/métodos , Alelos , Investigación
5.
Hum Mol Genet ; 31(22): 3934-3944, 2022 11 10.
Artículo en Inglés | MEDLINE | ID: mdl-35512355

RESUMEN

Genome-wide association studies (GWAS) of Crohn's disease (CD) in European and leprosy in Chinese population have shown that CD and leprosy share genetic risk loci. As these shared loci were identified through cross-comparisons across different ethnic populations, we hypothesized that meta-analysis of GWAS on CD and leprosy in East Asian populations would increase power to identify additional shared loci. We performed a cross-disease meta-analysis of GWAS data from CD (1621 cases and 4419 controls) and leprosy (2901 cases 3801 controls) followed by replication in additional datasets comprising 738 CD cases and 488 controls and 842 leprosy cases and 925 controls. We identified one novel locus at 7p22.3, rs77992257 in intron 2 of ADAP1, shared between CD and leprosy with genome-wide significance (P = 3.80 × 10-11) and confirmed 10 previously established loci in both diseases: IL23R, IL18RAP, IL12B, RIPK2, TNFSF15, ZNF365-EGR2, CCDC88B, LACC1, IL27, NOD2. Phenotype variance explained by the polygenic risk scores derived from Chinese leprosy data explained up to 5.28% of variance of Korean CD, supporting similar genetic structures between the two diseases. Although CD and leprosy shared a substantial number of genetic susceptibility loci in East Asians, the majority of shared susceptibility loci showed allelic effects in the opposite direction. Investigation of the genetic correlation using cross-trait linkage disequilibrium score regression also showed a negative genetic correlation between CD and leprosy (rg [SE] = -0.40[0.13], P = 2.6 × 10-3). These observations implicate the possibility that CD might be caused by hyper-sensitive reactions toward pathogenic stimuli.


Asunto(s)
Enfermedad de Crohn , Lepra , Humanos , Estudio de Asociación del Genoma Completo , Enfermedad de Crohn/genética , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple/genética , Pueblo Asiatico/genética , Sitios Genéticos , Lepra/genética , Estudios de Casos y Controles , Miembro 15 de la Superfamilia de Ligandos de Factores de Necrosis Tumoral/genética
6.
Hum Mol Genet ; 31(15): 2655-2667, 2022 08 17.
Artículo en Inglés | MEDLINE | ID: mdl-35043955

RESUMEN

Human leukocyte antigen (HLA) gene variants in the major histocompatibility complex (MHC) region are associated with numerous complex human diseases and quantitative traits. Previous phenome-wide association studies (PheWAS) for this region demonstrated that HLA association patterns to the phenome have both population-specific and population-shared components. We performed MHC PheWAS in the Korean population by analyzing associations between phenotypes and genetic variants in the MHC region using the Korea Biobank Array project data samples from the Korean Genome and Epidemiology Study cohorts. Using this single-population dataset, we curated and analyzed 82 phenotypes for 125 673 Korean individuals after imputing HLA using CookHLA, a recently developed imputation framework. More than one-third of these phenotypes showed significant associations, confirming 56 known associations and discovering 13 novel association signals that were not reported previously. In addition, we analyzed heritability explained by the variants in the MHC region and genetic correlations among phenotypes based on the MHC variants.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Pueblo Asiatico/genética , Predisposición Genética a la Enfermedad , Humanos , Complejo Mayor de Histocompatibilidad/genética , Fenómica , Fenotipo , Polimorfismo de Nucleótido Simple/genética
7.
Am J Hum Genet ; 108(1): 36-48, 2021 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-33352115

RESUMEN

Identifying and interpreting pleiotropic loci is essential to understanding the shared etiology among diseases and complex traits. A common approach to mapping pleiotropic loci is to meta-analyze GWAS summary statistics across multiple traits. However, this strategy does not account for the complex genetic architectures of traits, such as genetic correlations and heritabilities. Furthermore, the interpretation is challenging because phenotypes often have different characteristics and units. We propose PLEIO (Pleiotropic Locus Exploration and Interpretation using Optimal test), a summary-statistic-based framework to map and interpret pleiotropic loci in a joint analysis of multiple diseases and complex traits. Our method maximizes power by systematically accounting for genetic correlations and heritabilities of the traits in the association test. Any set of related phenotypes, binary or quantitative traits with different units, can be combined seamlessly. In addition, our framework offers interpretation and visualization tools to help downstream analyses. Using our method, we combined 18 traits related to cardiovascular disease and identified 13 pleiotropic loci, which showed four different patterns of associations.


Asunto(s)
Pleiotropía Genética/genética , Estudio de Asociación del Genoma Completo/métodos , Enfermedades Cardiovasculares/genética , Predisposición Genética a la Enfermedad/genética , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética
8.
J Autoimmun ; 145: 103206, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38554656

RESUMEN

Crohn's disease (CD) is a chronic inflammatory disorder affecting the bowel wall. Tissue-resident memory T (Trm) cells are implicated in CD, yet their characteristics remain unclear. We aimed to investigate the transcriptional profiles and functional characteristics of Trm cells in the small bowel of CD and their interactions with immune cells. Seven patients with CD and four with ulcerative colitis as controls were included. Single-cell RNA sequencing and paired T cell receptor sequencing assessed T cell subsets and transcriptional signatures in lamina propria (LP) and submucosa/muscularis propria-enriched fractions (SM/MP) from small bowel tissue samples. We detected 58,123 T cells grouped into 16 populations, including the CD4+ Trm cells with a Th17 signature and CD8+ Trm clusters. In CD, CD4+ Trm cells with a Th17 signature, termed Th17 Trm, showed significantly increased proportions within both the LP and SM/MP areas. The Th17 Trm cluster demonstrated heightened expression of tissue-residency marker genes (ITGAE, ITGA1, and CXCR6) along with elevated levels of IL17A, IL22, CCR6, and CCL20. The clonal expansion of Th17 Trm cells in CD was accompanied by enhanced transmural dynamic potential, as indicated by significantly higher migration scores. CD-prominent Th17 Trm cells displayed an increased interferon gamma (IFNγ)-related signature possibly linked with STAT1 activation, inducing chemokines (i.e., CXCL10, CXCL8, and CXCL9) in myeloid cells. Our findings underscored the elevated Th17 Trm cells throughout the small bowel in CD, contributing to disease pathogenesis through IFNγ induction and subsequent chemokine production in myeloid cells.


Asunto(s)
Enfermedad de Crohn , Memoria Inmunológica , Células T de Memoria , Células Th17 , Humanos , Enfermedad de Crohn/inmunología , Enfermedad de Crohn/genética , Enfermedad de Crohn/patología , Células Th17/inmunología , Células Th17/metabolismo , Células T de Memoria/inmunología , Células T de Memoria/metabolismo , Masculino , Femenino , Adulto , Persona de Mediana Edad , Mucosa Intestinal/inmunología , Mucosa Intestinal/metabolismo , Mucosa Intestinal/patología , Subgrupos de Linfocitos T/inmunología , Subgrupos de Linfocitos T/metabolismo , Biomarcadores , Perfilación de la Expresión Génica , Adulto Joven
9.
Nucleic Acids Res ; 50(12): e71, 2022 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-35420135

RESUMEN

The standard analysis pipeline for single-cell RNA-seq data consists of sequential steps initiated by clustering the cells. An innate limitation of this pipeline is that an imperfect clustering result can irreversibly affect the succeeding steps. For example, there can be cell types not well distinguished by clustering because they largely share the global structure, such as the anterior primitive streak and mid primitive streak cells. If one searches differentially expressed genes (DEGs) solely based on clustering, marker genes for distinguishing these types will be missed. Moreover, clustering depends on many parameters and can often be subjective to manual decisions. To overcome these limitations, we propose MarcoPolo, a method that identifies informative DEGs independently of prior clustering. MarcoPolo sorts out genes by evaluating if the distributions are bimodal, if similar expression patterns are observed in other genes, and if the expressing cells are proximal in a low-dimensional space. Using real datasets with FACS-purified cell labels, we demonstrate that MarcoPolo recovers marker genes better than competing methods. Notably, MarcoPolo finds key genes that can distinguish cell types that are not distinguishable by the standard clustering. MarcoPolo is built in a convenient software package that provides analysis results in an HTML file.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Algoritmos , Biomarcadores , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , RNA-Seq , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Secuenciación del Exoma
10.
PLoS Genet ; 17(6): e1009596, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-34061836

RESUMEN

The rapid decrease in sequencing cost has enabled genetic studies to discover rare variants associated with complex diseases and traits. Once this association is identified, the next step is to understand the genetic mechanism of rare variants on how the variants influence diseases. Similar to the hypothesis of common variants, rare variants may affect diseases by regulating gene expression, and recently, several studies have identified the effects of rare variants on gene expression using heritability and expression outlier analyses. However, identifying individual genes whose expression is regulated by rare variants has been challenging due to the relatively small sample size of expression quantitative trait loci studies and statistical approaches not optimized to detect the effects of rare variants. In this study, we analyze whole-genome sequencing and RNA-seq data of 681 European individuals collected for the Genotype-Tissue Expression (GTEx) project (v8) to identify individual genes in 49 human tissues whose expression is regulated by rare variants. To improve statistical power, we develop an approach based on a likelihood ratio test that combines effects of multiple rare variants in a nonlinear manner and has higher power than previous approaches. Using GTEx data, we identify many genes regulated by rare variants, and some of them are only regulated by rare variants and not by common variants. We also find that genes regulated by rare variants are enriched for expression outliers and disease-causing genes. These results suggest the regulatory effects of rare variants, which would be important in interpreting associations of rare variants with complex traits.


Asunto(s)
Regulación de la Expresión Génica , Sitios de Carácter Cuantitativo , Humanos , Herencia Multifactorial
11.
Bioinformatics ; 37(3): 416-418, 2021 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-32735319

RESUMEN

SUMMARY: Fine-mapping human leukocyte antigen (HLA) genes involved in disease susceptibility to individual alleles or amino acid residues has been challenging. Using information regarding HLA alleles obtained from HLA typing, HLA imputation or HLA inference, our software expands the alleles to amino acid sequences using the most recent IMGT/HLA database and prepares a dataset suitable for fine-mapping analysis. Our software also provides useful functionalities, such as various association tests, visualization tools and nomenclature conversion. AVAILABILITY AND IMPLEMENTATION: https://github.com/WansonChoi/HATK.


Asunto(s)
Antígenos HLA , Programas Informáticos , Alelos , Secuencia de Aminoácidos , Mapeo Cromosómico , Predisposición Genética a la Enfermedad , Antígenos HLA/genética , Prueba de Histocompatibilidad , Humanos
12.
Am J Physiol Lung Cell Mol Physiol ; 321(1): L130-L143, 2021 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-33909500

RESUMEN

Genome-wide association studies (GWASs) have identified regions associated with chronic obstructive pulmonary disease (COPD). GWASs of other diseases have shown an approximately 10-fold overrepresentation of nonsynonymous variants, despite limited exonic coverage on genotyping arrays. We hypothesized that a large-scale analysis of coding variants could discover novel genetic associations with COPD, including rare variants with large effect sizes. We performed a meta-analysis of exome arrays from 218,399 controls and 33,851 moderate-to-severe COPD cases. All exome-wide significant associations were present in regions previously identified by GWAS. We did not identify any novel rare coding variants with large effect sizes. Within GWAS regions on chromosomes 5q, 6p, and 15q, four coding variants were conditionally significant (P < 0.00015) when adjusting for lead GWAS single-nucleotide polymorphisms A common gasdermin B (GSDMB) splice variant (rs11078928) previously associated with a decreased risk for asthma was nominally associated with a decreased risk for COPD [minor allele frequency (MAF) = 0.46, P = 1.8e-4]. Two stop variants in coiled-coil α-helical rod protein 1 (CCHCR1), a gene involved in regulating cell proliferation, were associated with COPD (both P < 0.0001). The SERPINA1 Z allele was associated with a random-effects odds ratio of 1.43 for COPD (95% confidence interval = 1.17-1.74), though with marked heterogeneity across studies. Overall, COPD-associated exonic variants were identified in genes involved in DNA methylation, cell-matrix interactions, cell proliferation, and cell death. In conclusion, we performed the largest exome array meta-analysis of COPD to date and identified potential functional coding variants. Future studies are needed to identify rarer variants and further define the role of coding variants in COPD pathogenesis.


Asunto(s)
Exoma/genética , Marcadores Genéticos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Enfermedad Pulmonar Obstructiva Crónica/genética , Enfermedad Pulmonar Obstructiva Crónica/patología , Regulación de la Expresión Génica , Humanos , Metaanálisis como Asunto
13.
Hum Mol Genet ; 28(20): 3498-3513, 2019 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-31211845

RESUMEN

Many immune diseases occur at different rates among people with schizophrenia compared to the general population. Here, we evaluated whether this phenomenon might be explained by shared genetic risk factors. We used data from large genome-wide association studies to compare the genetic architecture of schizophrenia to 19 immune diseases. First, we evaluated the association with schizophrenia of 581 variants previously reported to be associated with immune diseases at genome-wide significance. We identified five variants with potentially pleiotropic effects. While colocalization analyses were inconclusive, functional characterization of these variants provided the strongest evidence for a model in which genetic variation at rs1734907 modulates risk of schizophrenia and Crohn's disease via altered methylation and expression of EPHB4-a gene whose protein product guides the migration of neuronal axons in the brain and the migration of lymphocytes towards infected cells in the immune system. Next, we investigated genome-wide sharing of common variants between schizophrenia and immune diseases using cross-trait LD score regression. Of the 11 immune diseases with available genome-wide summary statistics, we observed genetic correlation between six immune diseases and schizophrenia: inflammatory bowel disease (rg = 0.12 ± 0.03, P = 2.49 × 10-4), Crohn's disease (rg = 0.097 ± 0.06, P = 3.27 × 10-3), ulcerative colitis (rg = 0.11 ± 0.04, P = 4.05 × 10-3), primary biliary cirrhosis (rg = 0.13 ± 0.05, P = 3.98 × 10-3), psoriasis (rg = 0.18 ± 0.07, P = 7.78 × 10-3) and systemic lupus erythematosus (rg = 0.13 ± 0.05, P = 3.76 × 10-3). With the exception of ulcerative colitis, the degree and direction of these genetic correlations were consistent with the expected phenotypic correlation based on epidemiological data. Our findings suggest shared genetic risk factors contribute to the epidemiological association of certain immune diseases and schizophrenia.


Asunto(s)
Predisposición Genética a la Enfermedad/genética , Enfermedades del Sistema Inmune/etiología , Enfermedades del Sistema Inmune/genética , Esquizofrenia/etiología , Esquizofrenia/genética , Estudio de Asociación del Genoma Completo , Humanos , Enfermedades del Sistema Inmune/epidemiología , Polimorfismo de Nucleótido Simple/genética , Esquizofrenia/epidemiología
14.
Hum Mol Genet ; 27(22): 3901-3910, 2018 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-30084967

RESUMEN

Crohn's disease (CD) and ulcerative colitis (UC) are the major types of chronic inflammatory bowel disease (IBD) characterized by recurring episodes of inflammation of the gastrointestinal tract. Although it is well established that human leukocyte antigen (HLA) is a major risk factor for IBD, it is yet to be determined which HLA alleles or amino acids drive the risks of CD and UC in Asians. To define the roles of HLA for IBD in Asians, we fine-mapped HLA in 12 568 individuals from Korea and Japan (3294 patients with CD, 1522 patients with UC and 7752 controls). We identified that the amino acid position 37 of HLA-DRß1 plays a key role in the susceptibility to CD (presence of serine being protective, P = 3.6 × 10-67, OR = 0.48 [0.45-0.52]). For UC, we confirmed the known association of the haplotype spanning HLA-C*12:02, HLA-B*52:01 and HLA-DRB1*1502 (P = 1.2 × 10-28, OR = 4.01 [3.14-5.12]).


Asunto(s)
Colitis Ulcerosa/genética , Enfermedad de Crohn/genética , Predisposición Genética a la Enfermedad , Cadenas HLA-DRB1/genética , Enfermedades Inflamatorias del Intestino/genética , Alelos , Sustitución de Aminoácidos/genética , Aminoácidos/química , Aminoácidos/genética , Pueblo Asiatico/genética , Colitis Ulcerosa/patología , Enfermedad de Crohn/patología , Femenino , Estudios de Asociación Genética , Genotipo , Cadenas HLA-DRB1/química , Haplotipos/genética , Humanos , Enfermedades Inflamatorias del Intestino/patología , Japón , Masculino , Conformación Proteica , República de Corea
15.
J Proteome Res ; 18(8): 3195-3202, 2019 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-31314536

RESUMEN

Deep learning (DL), a type of machine learning approach, is a powerful tool for analyzing large sets of data that are derived from biomedical sciences. However, it remains unknown whether DL is suitable for identifying contributing factors, such as biomarkers, in quantitative proteomics data. In this study, we describe an optimized DL-based analytical approach using a data set that was generated by selected reaction monitoring-mass spectrometry (SRM-MS), comprising SRM-MS data from 1008 samples for the diagnosis of pancreatic cancer, to test its classification power. Its performance was compared with that of 5 conventional multivariate and machine learning methods: random forest (RF), support vector machine (SVM), logistic regression (LR), k-nearest neighbors (k-NN), and naïve Bayes (NB). The DL method yielded the best classification (AUC 0.9472 for the test data set) of all approaches. We also optimized the parameters of DL individually to determine which factors were the most significant. In summary, the DL method has advantages in classifying the quantitative proteomics data of pancreatic cancer patients, and our results suggest that its implementation can improve the performance of diagnostic assays in clinical settings.


Asunto(s)
Aprendizaje Profundo/estadística & datos numéricos , Aprendizaje Automático/estadística & datos numéricos , Espectrometría de Masas/estadística & datos numéricos , Proteómica/estadística & datos numéricos , Algoritmos , Teorema de Bayes , Análisis por Conglomerados , Humanos , Modelos Logísticos , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas/patología , Máquina de Vectores de Soporte/estadística & datos numéricos
16.
Am J Hum Genet ; 99(1): 89-103, 2016 Jul 07.
Artículo en Inglés | MEDLINE | ID: mdl-27292110

RESUMEN

Genome-wide association studies (GWASs) have been successful in detecting variants correlated with phenotypes of clinical interest. However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and for phenotypes that are difficult to collect, the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes or related phenotypes are easier to collect and have already been collected in very large samples. This paper demonstrates how we take advantage of these additional related phenotypes to impute the phenotype of interest or target phenotype and then perform association analysis. Our approach leverages the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically given the correlation structure of the phenotypes used in imputation. In addition, our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes. Thus, our method is applicable to datasets for which we have access only to summary statistics and not to the raw genotypes. We illustrate our approach by analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Animales , Presión Sanguínea/genética , Índice de Masa Corporal , Estudios de Cohortes , Conjuntos de Datos como Asunto , Finlandia , Genotipo , Humanos , Ratones , Modelos Genéticos , Herencia Multifactorial , Reproducibilidad de los Resultados , Proyectos de Investigación , Tamaño de la Muestra , Triglicéridos/sangre
17.
J Gastroenterol Hepatol ; 34(10): 1777-1783, 2019 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-31038770

RESUMEN

BACKGROUND AND AIM: Tobacco smoking is a risk factor for gastrointestinal disorders, causing mucosal damage and impairing immune responses. However, smoking has been found to be protective against ulcerative colitis (UC). Human leukocyte antigen (HLA) is a major susceptibility locus for UC, and HLA-DRB1*15:02 has the strongest effect in Asians. This study investigated the effects of smoking on the association between HLA and UC. METHODS: The study enrolled 882 patients with UC, including 526 never, 151 current, and 205 former smokers, and 3091 healthy controls, including 2124 never, 502 current, and 465 former smokers. Smoking-stratified analyses of HLA data were performed using a case-control approach. RESULTS: In a case-control approach, HLA-DRB1*15:02 was associated with UC in never smokers (ORnever smokers  = 3.20, Pnever smokers  = 7.88 × 10-23 ) but not in current or former smokers (Pcurrent smokers  = 0.72 and Pformer smokers  = 0.33, respectively). In current smokers, HLA-DQB1*06 was associated with UC (ORcurrent smokers  = 2.59, Pcurrent smokers  = 6.39 × 10-12 ). No variants reached genome-wide significance in former smokers. CONCLUSIONS: An association between UC and HLA-DRB1*15:02 was limited to never smokers. Our findings highlight that tobacco smoking modifies the effects of HLA on the risk of UC.


Asunto(s)
Colitis Ulcerosa/genética , Interacción Gen-Ambiente , Cadenas HLA-DRB1/genética , No Fumadores , Fumadores , Fumar/genética , Adulto , Anciano , Estudios de Casos y Controles , Colitis Ulcerosa/diagnóstico , Colitis Ulcerosa/inmunología , Femenino , Cadenas HLA-DRB1/inmunología , Humanos , Masculino , Persona de Mediana Edad , Medición de Riesgo , Factores de Riesgo , Fumar/efectos adversos , Fumar/inmunología , Cese del Hábito de Fumar
18.
Hum Mol Genet ; 25(9): 1857-66, 2016 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-26908615

RESUMEN

Meta-analysis strategies have become critical to augment power of genome-wide association studies (GWAS). To reduce genotyping or sequencing cost, many studies today utilize shared controls, and these individuals can inadvertently overlap among multiple studies. If these overlapping individuals are not taken into account in meta-analysis, they can induce spurious associations. In this article, we propose a general framework for adjusting association statistics to account for overlapping subjects within a meta-analysis. The key idea of our method is to transform the covariance structure of the data, so it can be used in downstream analyses. As a result, the strategy is very flexible and allows a wide range of meta-analysis methods, such as the random effects model, to account for overlapping subjects. Using simulations and real datasets, we demonstrate that our method has utility in meta-analyses of GWAS, as well as in a multi-tissue mouse expression quantitative trait loci (eQTL) study where our method increases the number of discovered eQTL by up to 19% compared with existing methods.


Asunto(s)
Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Metaanálisis como Asunto , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Animales , Estudios de Casos y Controles , Perfilación de la Expresión Génica , Humanos , Ratones , Modelos Teóricos
19.
Am J Hum Genet ; 97(1): 139-52, 2015 Jul 02.
Artículo en Inglés | MEDLINE | ID: mdl-26140449

RESUMEN

Identifying genomic annotations that differentiate causal from trait-associated variants is essential to fine mapping disease loci. Although many studies have identified non-coding functional annotations that overlap disease-associated variants, these annotations often colocalize, complicating the ability to use these annotations for fine mapping causal variation. We developed a statistical approach (Genomic Annotation Shifter [GoShifter]) to assess whether enriched annotations are able to prioritize causal variation. GoShifter defines the null distribution of an annotation overlapping an allele by locally shifting annotations; this approach is less sensitive to biases arising from local genomic structure than commonly used enrichment methods that depend on SNP matching. Local shifting also allows GoShifter to identify independent causal effects from colocalizing annotations. Using GoShifter, we confirmed that variants in expression quantitative trail loci drive gene-expression changes though DNase-I hypersensitive sites (DHSs) near transcription start sites and independently through 3' UTR regulation. We also showed that (1) 15%-36% of trait-associated loci map to DHSs independently of other annotations; (2) loci associated with breast cancer and rheumatoid arthritis harbor potentially causal variants near the summits of histone marks rather than full peak bodies; (3) variants associated with height are highly enriched in embryonic stem cell DHSs; and (4) we can effectively prioritize causal variation at specific loci.


Asunto(s)
Regulación de la Expresión Génica/genética , Variación Genética , Genoma Humano/genética , Anotación de Secuencia Molecular/métodos , Sitios de Carácter Cuantitativo/genética , Artritis Reumatoide/genética , Neoplasias de la Mama/genética , Histonas/genética , Histonas/metabolismo , Humanos
20.
Am J Hum Genet ; 96(6): 857-68, 2015 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-26027500

RESUMEN

In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum p value among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset.


Asunto(s)
Interpretación Estadística de Datos , Regulación de la Expresión Génica/genética , Genes/genética , Variación Genética , Sitios de Carácter Cuantitativo/genética , Humanos , Análisis Multivariante , Distribución Normal , Polimorfismo de Nucleótido Simple/genética , Probabilidad , Tamaño de la Muestra , Estadísticas no Paramétricas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA