Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Genome Res ; 31(6): 935-946, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33963077

RESUMEN

Genomic deletions provide a powerful loss-of-function model in noncoding regions to assess the role of purifying selection on genetic variation. Regulatory element function is characterized by nonuniform tissue and cell type activity, necessarily linking the study of fitness consequences from regulatory variants to their corresponding cellular activity. We generated a callset of deletions from genomes in the Alzheimer's Disease Neuroimaging Initiative (ADNI) and used deletions from The 1000 Genomes Project Consortium (1000GP) in order to examine whether purifying selection preserves noncoding sites of chromatin accessibility marked by DNase I hypersensitivity (DHS), histone modification (enhancer, transcribed, Polycomb-repressed, heterochromatin), and chromatin loop anchors. To examine this in a cellular activity-aware manner, we developed a statistical method, pleiotropy ratio score (PlyRS), which calculates a correlation-adjusted count of "cellular pleiotropy" for each noncoding base pair by analyzing shared regulatory annotations across tissues and cell types. By comparing real deletion PlyRS values to simulations in a length-matched framework and by using genomic covariates in analyses, we found that purifying selection acts to preserve both DHS and enhancer noncoding sites. However, we did not find evidence of purifying selection for noncoding transcribed, Polycomb-repressed, or heterochromatin sites beyond that of the noncoding background. Additionally, we found evidence that purifying selection is acting on chromatin loop integrity by preserving colocalized CTCF binding sites. At regions of DHS, enhancer, and CTCF within chromatin loop anchors, we found evidence that both sites of activity specific to a particular tissue or cell type and sites of cellularly pleiotropic activity are preserved by selection.


Asunto(s)
Cromatina , Genómica , Sitios de Unión , Cromatina/genética , Humanos , Proteínas del Grupo Polycomb/metabolismo
2.
PLoS Genet ; 17(6): e1009596, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-34061836

RESUMEN

The rapid decrease in sequencing cost has enabled genetic studies to discover rare variants associated with complex diseases and traits. Once this association is identified, the next step is to understand the genetic mechanism of rare variants on how the variants influence diseases. Similar to the hypothesis of common variants, rare variants may affect diseases by regulating gene expression, and recently, several studies have identified the effects of rare variants on gene expression using heritability and expression outlier analyses. However, identifying individual genes whose expression is regulated by rare variants has been challenging due to the relatively small sample size of expression quantitative trait loci studies and statistical approaches not optimized to detect the effects of rare variants. In this study, we analyze whole-genome sequencing and RNA-seq data of 681 European individuals collected for the Genotype-Tissue Expression (GTEx) project (v8) to identify individual genes in 49 human tissues whose expression is regulated by rare variants. To improve statistical power, we develop an approach based on a likelihood ratio test that combines effects of multiple rare variants in a nonlinear manner and has higher power than previous approaches. Using GTEx data, we identify many genes regulated by rare variants, and some of them are only regulated by rare variants and not by common variants. We also find that genes regulated by rare variants are enriched for expression outliers and disease-causing genes. These results suggest the regulatory effects of rare variants, which would be important in interpreting associations of rare variants with complex traits.


Asunto(s)
Regulación de la Expresión Génica , Sitios de Carácter Cuantitativo , Humanos , Herencia Multifactorial
3.
PLoS Genet ; 17(9): e1009772, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34516545

RESUMEN

Late-onset Alzheimer's disease (LOAD) is the most common type of dementia causing irreversible brain damage to the elderly and presents a major public health challenge. Clinical research and genome-wide association studies have suggested a potential contribution of the endocytic pathway to AD, with an emphasis on common loci. However, the contribution of rare variants in this pathway to AD has not been thoroughly investigated. In this study, we focused on the effect of rare variants on AD by first applying a rare-variant gene-set burden analysis using genes in the endocytic pathway on over 3,000 individuals with European ancestry from three large whole-genome sequencing (WGS) studies. We identified significant associations of rare-variant burden within the endocytic pathway with AD, which were successfully replicated in independent datasets. We further demonstrated that this endocytic rare-variant enrichment is associated with neurofibrillary tangles (NFTs) and age-related phenotypes, increasing the risk of obtaining severer brain damage, earlier age-at-onset, and earlier age-of-death. Next, by aggregating rare variants within each gene, we sought to identify single endocytic genes associated with AD and NFTs. Careful examination using NFTs revealed one significantly associated gene, ANKRD13D. To identify functional associations, we integrated bulk RNA-Seq data from over 600 brain tissues and found two endocytic expression genes (eGenes), HLA-A and SLC26A7, that displayed significant influences on their gene expressions. Differential expressions between AD patients and controls of these three identified genes were further examined by incorporating scRNA-Seq data from 48 post-mortem brain samples and demonstrated distinct expression patterns across cell types. Taken together, our results demonstrated strong rare-variant effect in the endocytic pathway on AD risk and progression and functional effect of gene expression alteration in both bulk and single-cell resolution, which may bring more insight and serve as valuable resources for future AD genetic studies, clinical research, and therapeutic targeting.


Asunto(s)
Enfermedad de Alzheimer/patología , Endocitosis , Fenotipo , Enfermedad de Alzheimer/genética , Estudio de Asociación del Genoma Completo , Humanos , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma
4.
Bioinformatics ; 37(1): 9-16, 2021 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-33416856

RESUMEN

MOTIVATION: Since the first human genome was sequenced in 2001, there has been a rapid growth in the number of bioinformatic methods to process and analyze next-generation sequencing (NGS) data for research and clinical studies that aim to identify genetic variants influencing diseases and traits. To achieve this goal, one first needs to call genetic variants from NGS data, which requires multiple computationally intensive analysis steps. Unfortunately, there is a lack of an open-source pipeline that can perform all these steps on NGS data in a manner, which is fully automated, efficient, rapid, scalable, modular, user-friendly and fault tolerant. To address this, we introduce xGAP, an extensible Genome Analysis Pipeline, which implements modified GATK best practice to analyze DNA-seq data with the aforementioned functionalities. RESULTS: xGAP implements massive parallelization of the modified GATK best practice pipeline by splitting a genome into many smaller regions with efficient load-balancing to achieve high scalability. It can process 30× coverage whole-genome sequencing (WGS) data in ∼90 min. In terms of accuracy of discovered variants, xGAP achieves average F1 scores of 99.37% for single nucleotide variants and 99.20% for insertion/deletions across seven benchmark WGS datasets. We achieve highly consistent results across multiple on-premises (SGE & SLURM) high-performance clusters. Compared to the Churchill pipeline, with similar parallelization, xGAP is 20% faster when analyzing 50× coverage WGS on Amazon Web Service. Finally, xGAP is user-friendly and fault tolerant where it can automatically re-initiate failed processes to minimize required user intervention. AVAILABILITY AND IMPLEMENTATION: xGAP is available at https://github.com/Adigorla/xgap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.
PLoS Genet ; 15(12): e1008481, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31834882

RESUMEN

Many disease risk loci identified in genome-wide association studies are present in non-coding regions of the genome. Previous studies have found enrichment of expression quantitative trait loci (eQTLs) in disease risk loci, indicating that identifying causal variants for gene expression is important for elucidating the genetic basis of not only gene expression but also complex traits. However, detecting causal variants is challenging due to complex genetic correlation among variants known as linkage disequilibrium (LD) and the presence of multiple causal variants within a locus. Although several fine-mapping approaches have been developed to overcome these challenges, they may produce large sets of putative causal variants when true causal variants are in high LD with many non-causal variants. In eQTL studies, there is an additional source of information that can be used to improve fine-mapping called allelic imbalance (AIM) that measures imbalance in gene expression on two chromosomes of a diploid organism. In this work, we develop a novel statistical method that leverages both AIM and total expression data to detect causal variants that regulate gene expression. We illustrate through simulations and application to 10 tissues of the Genotype-Tissue Expression (GTEx) dataset that our method identifies the true causal variants with higher specificity than an approach that uses only eQTL information. Across all tissues and genes, our method achieves a median reduction rate of 11% in the number of putative causal variants. We use chromatin state data from the Roadmap Epigenomics Consortium to show that the putative causal variants identified by our method are enriched for active regions of the genome, providing orthogonal support that our method identifies causal variants with increased specificity.


Asunto(s)
Desequilibrio Alélico , Cromatina/genética , Mapeo Cromosómico/métodos , Sitios de Carácter Cuantitativo , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Herencia Multifactorial , Polimorfismo de Nucleótido Simple
6.
Proc Natl Acad Sci U S A ; 116(25): 12516-12523, 2019 06 18.
Artículo en Inglés | MEDLINE | ID: mdl-31164420

RESUMEN

BACE1 is the rate-limiting enzyme for amyloid-ß peptides (Aß) generation, a key event in the pathogenesis of Alzheimer's disease (AD). By an unknown mechanism, levels of BACE1 and a BACE1 mRNA-stabilizing antisense RNA (BACE1-AS) are elevated in the brains of AD patients, implicating that dysregulation of BACE1 expression plays an important role in AD pathogenesis. We found that nuclear factor erythroid-derived 2-related factor 2 (NRF2/NFE2L2) represses the expression of BACE1 and BACE1-AS through binding to antioxidant response elements (AREs) in their promoters of mouse and human. NRF2-mediated inhibition of BACE1 and BACE1-AS expression is independent of redox regulation. NRF2 activation decreases production of BACE1 and BACE1-AS transcripts and Aß production and ameliorates cognitive deficits in animal models of AD. Depletion of NRF2 increases BACE1 and BACE1-AS expression and Aß production and worsens cognitive deficits. Our findings suggest that activation of NRF2 can prevent a key early pathogenic process in AD.


Asunto(s)
Enfermedad de Alzheimer/metabolismo , Secretasas de la Proteína Precursora del Amiloide/metabolismo , Ácido Aspártico Endopeptidasas/metabolismo , Trastornos del Conocimiento/metabolismo , Factor 2 Relacionado con NF-E2/metabolismo , Enfermedad de Alzheimer/patología , Secretasas de la Proteína Precursora del Amiloide/genética , Péptidos beta-Amiloides/metabolismo , Animales , Ácido Aspártico Endopeptidasas/genética , Trastornos del Conocimiento/patología , Modelos Animales de Enfermedad , Regulación de la Expresión Génica , Humanos , Isotiocianatos/farmacología , Ratones , Ratones Transgénicos , Factor 2 Relacionado con NF-E2/biosíntesis , Regiones Promotoras Genéticas , Unión Proteica , Especies Reactivas de Oxígeno/metabolismo , Sulfóxidos , Transcripción Genética
7.
Am J Hum Genet ; 103(5): 707-726, 2018 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-30401458

RESUMEN

Most population isolates examined to date were founded from a single ancestral population. Consequently, there is limited knowledge about the demographic history of admixed population isolates. Here we investigate genomic diversity of recently admixed population isolates from Costa Rica and Colombia and compare their diversity to a benchmark population isolate, the Finnish. These Latin American isolates originated during the 16th century from admixture between a few hundred European males and Amerindian females, with a limited contribution from African founders. We examine whole-genome sequence data from 449 individuals, ascertained as families to build mutigenerational pedigrees, with a mean sequencing depth of coverage of approximately 36×. We find that Latin American isolates have increased genetic diversity relative to the Finnish. However, there is an increase in the amount of identity by descent (IBD) segments in the Latin American isolates relative to the Finnish. The increase in IBD segments is likely a consequence of a very recent and severe population bottleneck during the founding of the admixed population isolates. Furthermore, the proportion of the genome that falls within a long run of homozygosity (ROH) in Costa Rican and Colombian individuals is significantly greater than that in the Finnish, suggesting more recent consanguinity in the Latin American isolates relative to that seen in the Finnish. Lastly, we find that recent consanguinity increased the number of deleterious variants found in the homozygous state, which is relevant if deleterious variants are recessive. Our study suggests that there is no single genetic signature of a population isolate.


Asunto(s)
Genoma Humano/genética , Colombia , Consanguinidad , Costa Rica , Femenino , Genética de Población/métodos , Genómica/métodos , Homocigoto , Humanos , Masculino , Linaje , Población Blanca/genética , Secuenciación Completa del Genoma/métodos
8.
PLoS Genet ; 14(12): e1007309, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30589851

RESUMEN

A genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to accurately test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then motivate mixed models in the context of unmodeled factors.


Asunto(s)
Genética de Población , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Animales , Sesgo , Enfermedad/genética , Femenino , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Modelos Lineales , Masculino , Ratones , Modelos Estadísticos , Linaje , Fenotipo , Filogenia , Polimorfismo de Nucleótido Simple
9.
PLoS Comput Biol ; 15(12): e1007556, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31851693

RESUMEN

Next-generation sequencing technology (NGS) enables the discovery of nearly all genetic variants present in a genome. A subset of these variants, however, may have poor sequencing quality due to limitations in NGS or variant callers. In genetic studies that analyze a large number of sequenced individuals, it is critical to detect and remove those variants with poor quality as they may cause spurious findings. In this paper, we present ForestQC, a statistical tool for performing quality control on variants identified from NGS data by combining a traditional filtering approach and a machine learning approach. Our software uses the information on sequencing quality, such as sequencing depth, genotyping quality, and GC contents, to predict whether a particular variant is likely to be false-positive. To evaluate ForestQC, we applied it to two whole-genome sequencing datasets where one dataset consists of related individuals from families while the other consists of unrelated individuals. Results indicate that ForestQC outperforms widely used methods for performing quality control on variants such as VQSR of GATK by considerably improving the quality of variants to be included in the analysis. ForestQC is also very efficient, and hence can be applied to large sequencing datasets. We conclude that combining a machine learning algorithm trained with sequencing quality information and the filtering approach is a practical approach to perform quality control on genetic variants from sequencing data.


Asunto(s)
Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Programas Informáticos , Algoritmos , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Aprendizaje Automático , Polimorfismo de Nucleótido Simple , Control de Calidad , Secuenciación Completa del Genoma/normas , Secuenciación Completa del Genoma/estadística & datos numéricos
10.
Am J Hum Genet ; 99(4): 846-859, 2016 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-27666371

RESUMEN

Recently, multiple studies have performed whole-exome or whole-genome sequencing to identify groups of rare variants associated with complex traits and diseases. They have primarily utilized case-control study designs that often require thousands of individuals to reach acceptable statistical power. Family-based studies can be more powerful because a rare variant can be enriched in an extended pedigree and segregate with the phenotype. Although many methods have been proposed for using family data to discover rare variants involved in a disease, a majority of them focus on a specific pedigree structure and are designed to analyze either binary or continuously measured outcomes. In this article, we propose RareIBD, a general and powerful approach to identifying rare variants involved in disease susceptibility. Our method can be applied to large extended families of arbitrary structure, including pedigrees with only affected individuals. The method accommodates both binary and quantitative traits. A series of simulation experiments suggest that RareIBD is a powerful test that outperforms existing approaches. In addition, our method accounts for individuals in top generations, which are not usually genotyped in extended families. In contrast to available statistical tests, RareIBD generates accurate p values even when genetic data from these individuals are missing. We applied RareIBD, as well as other methods, to two extended family datasets generated by different genotyping technologies and representing different ethnicities. The analysis of real data confirmed that RareIBD is the only method that properly controls type I error.


Asunto(s)
Familia , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Linaje , Conjuntos de Datos como Asunto , Etnicidad/genética , Femenino , Genotipo , Humanos , Masculino , Modelos Genéticos , Fenotipo , Proyectos de Investigación
11.
Am J Hum Genet ; 99(6): 1245-1260, 2016 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-27866706

RESUMEN

The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci.


Asunto(s)
Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Modelos Estadísticos , Sitios de Carácter Cuantitativo/genética , Conjuntos de Datos como Asunto , Regulación de la Expresión Génica/genética , Genotipo , Glucosa/metabolismo , Humanos , Insulina/metabolismo , Desequilibrio de Ligamiento , Especificidad de Órganos , Probabilidad , Tamaño de la Muestra
12.
PLoS Genet ; 12(3): e1005849, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26943367

RESUMEN

Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.


Asunto(s)
Interacción Gen-Ambiente , Genética de Población , Genoma Humano , Estudio de Asociación del Genoma Completo/métodos , Simulación por Computador , Humanos , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple/genética
13.
Am J Respir Cell Mol Biol ; 58(3): 391-401, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29077507

RESUMEN

Obstructive sleep apnea (OSA) is a common heritable disorder displaying marked sexual dimorphism in disease prevalence and progression. Previous genetic association studies have identified a few genetic loci associated with OSA and related quantitative traits, but they have only focused on single ethnic groups, and a large proportion of the heritability remains unexplained. The apnea-hypopnea index (AHI) is a commonly used quantitative measure characterizing OSA severity. Because OSA differs by sex, and the pathophysiology of obstructive events differ in rapid eye movement (REM) and non-REM (NREM) sleep, we hypothesized that additional genetic association signals would be identified by analyzing the NREM/REM-specific AHI and by conducting sex-specific analyses in multiethnic samples. We performed genome-wide association tests for up to 19,733 participants of African, Asian, European, and Hispanic/Latino American ancestry in 7 studies. We identified rs12936587 on chromosome 17 as a possible quantitative trait locus for NREM AHI in men (N = 6,737; P = 1.7 × 10-8) but not in women (P = 0.77). The association with NREM AHI was replicated in a physiological research study (N = 67; P = 0.047). This locus overlapping the RAI1 gene and encompassing genes PEMT1, SREBF1, and RASD1 was previously reported to be associated with coronary artery disease, lipid metabolism, and implicated in Potocki-Lupski syndrome and Smith-Magenis syndrome, which are characterized by abnormal sleep phenotypes. We also identified gene-by-sex interactions in suggestive association regions, suggesting that genetic variants for AHI appear to vary by sex, consistent with the clinical observations of strong sexual dimorphism.


Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo/genética , Apnea Obstructiva del Sueño/genética , Sueño REM/fisiología , Factores de Transcripción/genética , Adulto , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Fosfatidiletanolamina N-Metiltransferasa/genética , Caracteres Sexuales , Proteína 1 de Unión a los Elementos Reguladores de Esteroles/genética , Transactivadores , Proteínas ras/genética
14.
Hum Mol Genet ; 25(9): 1857-66, 2016 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-26908615

RESUMEN

Meta-analysis strategies have become critical to augment power of genome-wide association studies (GWAS). To reduce genotyping or sequencing cost, many studies today utilize shared controls, and these individuals can inadvertently overlap among multiple studies. If these overlapping individuals are not taken into account in meta-analysis, they can induce spurious associations. In this article, we propose a general framework for adjusting association statistics to account for overlapping subjects within a meta-analysis. The key idea of our method is to transform the covariance structure of the data, so it can be used in downstream analyses. As a result, the strategy is very flexible and allows a wide range of meta-analysis methods, such as the random effects model, to account for overlapping subjects. Using simulations and real datasets, we demonstrate that our method has utility in meta-analyses of GWAS, as well as in a multi-tissue mouse expression quantitative trait loci (eQTL) study where our method increases the number of discovered eQTL by up to 19% compared with existing methods.


Asunto(s)
Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Metaanálisis como Asunto , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Animales , Estudios de Casos y Controles , Perfilación de la Expresión Génica , Humanos , Ratones , Modelos Teóricos
15.
Am J Hum Genet ; 96(6): 857-68, 2015 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-26027500

RESUMEN

In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum p value among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset.


Asunto(s)
Interpretación Estadística de Datos , Regulación de la Expresión Génica/genética , Genes/genética , Variación Genética , Sitios de Carácter Cuantitativo/genética , Humanos , Análisis Multivariante , Distribución Normal , Polimorfismo de Nucleótido Simple/genética , Probabilidad , Tamaño de la Muestra , Estadísticas no Paramétricas
16.
Bioinformatics ; 33(14): i67-i74, 2017 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-28881962

RESUMEN

MOTIVATION: There is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues. RESULTS: We introduce a meta-analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta-analyses. AVAILABILITY AND IMPLEMENTATION: Source code is at https://github.com/datduong/RECOV . CONTACT: eeskin@cs.ucla.edu or datdb@cs.ucla.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Variación Genética , Sitios de Carácter Cuantitativo , Programas Informáticos , Perfilación de la Expresión Génica/métodos , Estudio de Asociación del Genoma Completo/métodos , Humanos , Metaanálisis como Asunto , Modelos Genéticos
17.
Bioinformatics ; 32(12): i156-i163, 2016 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-27307612

RESUMEN

MOTIVATION: Expression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power. RESULTS: We applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation method. CONTACT: buhm.han@amc.seoul.kr or eeskin@cs.ucla.edu.


Asunto(s)
Genómica , Variación Genética , Genotipo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo
18.
PLoS Genet ; 9(6): e1003491, 2013 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-23785294

RESUMEN

Gene expression data, in conjunction with information on genetic variants, have enabled studies to identify expression quantitative trait loci (eQTLs) or polymorphic locations in the genome that are associated with expression levels. Moreover, recent technological developments and cost decreases have further enabled studies to collect expression data in multiple tissues. One advantage of multiple tissue datasets is that studies can combine results from different tissues to identify eQTLs more accurately than examining each tissue separately. The idea of aggregating results of multiple tissues is closely related to the idea of meta-analysis which aggregates results of multiple genome-wide association studies to improve the power to detect associations. In principle, meta-analysis methods can be used to combine results from multiple tissues. However, eQTLs may have effects in only a single tissue, in all tissues, or in a subset of tissues with possibly different effect sizes. This heterogeneity in terms of effects across multiple tissues presents a key challenge to detect eQTLs. In this paper, we develop a framework that leverages two popular meta-analysis methods that address effect size heterogeneity to detect eQTLs across multiple tissues. We show by using simulations and multiple tissue data from mouse that our approach detects many eQTLs undetected by traditional eQTL methods. Additionally, our method provides an interpretation framework that accurately predicts whether an eQTL has an effect in a particular tissue.


Asunto(s)
Expresión Génica , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo/genética , Animales , Perfilación de la Expresión Génica , Genoma , Ratones , Modelos Teóricos , Especificidad de Órganos
20.
Biomol Ther (Seoul) ; 32(4): 432-441, 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38835111

RESUMEN

Systemic sclerosis is an autoimmune disease characterized by inflammatory reactions and fibrosis. Myofibroblasts are considered therapeutic targets for preventing and reversing the pathogenesis of fibrosis in systemic sclerosis. Although the mechanisms that differentiate into myofibroblasts are diverse, transforming growth factor ß (TGF-ß) is known to be a key mediator of fibrosis in systemic sclerosis. This study investigated the effects of extracellular vesicles derived from human adipose stem cells (ASC-EVs) in an in vivo systemic sclerosis model and in vitro TGF-ß1-induced dermal fibroblasts. The therapeutic effects of ASC-EVs on the in vivo systemic sclerosis model were evaluated based on dermal thickness and the number of α-smooth muscle actin (α-SMA)-expressing cells using hematoxylin and eosin staining and immunohistochemistry. Administration of ASC-EVs decreased both the dermal thickness and α-SMA expressing cell number as well as the mRNA levels of fibrotic genes, such as Acta2, Ccn2, Col1a1 and Comp. Additionally, we discovered that ASC-EVs can decrease the expression of α-SMA and CTGF and suppress the TGF-ß pathway by inhibiting the activation of SMAD2 in dermal fibroblasts induced by TGF-ß1. Finally, TGF-ß1-induced dermal fibroblasts underwent selective death through ASC-EVs treatment. These results indicate that ASC-EVs could provide a therapeutic approach for preventing and reversing systemic sclerosis.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA