RESUMO
DNA sequence variation causes changes in gene expression, which in turn has profound effects on cellular states. These variations affect tissue development and may ultimately lead to pathological phenotypes. A genetic locus containing a sequence variation that affects gene expression is called an "expression quantitative trait locus" (eQTL). Whereas the impact of cellular context on expression levels in general is well established, a lot less is known about the cell-state specificity of eQTL. Previous studies differed with respect to how "dynamic eQTL" were defined. Here, we propose a unified framework distinguishing static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. Further, we introduce a new approach to simultaneously infer eQTL from different cell types. By using murine mRNA expression data from four stages of hematopoiesis and 14 related cellular traits, we demonstrate that static, conditional and dynamic eQTL, although derived from the same expression data, represent functionally distinct types of eQTL. While static eQTL affect generic cellular processes, non-static eQTL are more often involved in hematopoiesis and immune response. Our analysis revealed substantial effects of individual genetic variation on cell type-specific expression regulation. Among a total number of 3,941 eQTL we detected 2,729 static eQTL, 1,187 eQTL were conditionally active in one or several cell types, and 70 eQTL affected expression changes during cell type transitions. We also found evidence for feedback control mechanisms reverting the effect of an eQTL specifically in certain cell types. Loci correlated with hematological traits were enriched for conditional eQTL, thus, demonstrating the importance of conditional eQTL for understanding molecular mechanisms underlying physiological trait variation. The classification proposed here has the potential to streamline and unify future analysis of conditional and dynamic eQTL as well as many other kinds of QTL data.
Assuntos
Diferenciação Celular/genética , Mapeamento Cromossômico , Especificidade de Órgãos/genética , Locos de Características Quantitativas/genética , Animais , Linhagem da Célula , Regulação da Expressão Gênica no Desenvolvimento , Variação Genética , Células-Tronco Hematopoéticas/metabolismo , Camundongos , Modelos TeóricosRESUMO
Chromatin immunoprecipitation coupled with deep sequencing (ChIP-seq) has great potential for elucidating transcriptional networks, by measuring genome-wide binding of transcription factors (TFs) at high resolution. Despite the precision of these experiments, identification of genes directly regulated by a TF (target genes) is not trivial. Numerous target gene scoring methods have been used in the past. However, their suitability for the task and their performance remain unclear, because a thorough comparative assessment of these methods is still lacking. Here we present a systematic evaluation of computational methods for defining TF targets based on ChIP-seq data. We validated predictions based on 68 ChIP-seq studies using a wide range of genomic expression data and functional information. We demonstrate that peak-to-gene assignment is the most crucial step for correct target gene prediction and propose a parameter-free method performing most consistently across the evaluation tests.
Assuntos
Sítios de Ligação , Imunoprecipitação da Cromatina/métodos , Genômica/métodos , Fatores de Transcrição/química , Fatores de Transcrição/genética , Algoritmos , Animais , Bases de Dados Genéticas , Genoma , Camundongos , Modelos Estatísticos , Reprodutibilidade dos Testes , Análise de Sequência de DNARESUMO
BACKGROUND: Genome-wide association studies (GWASs) have revealed relationships between over 57,000 genetic variants and diseases. However, unlike Mendelian diseases, complex diseases arise from the interplay of multiple genetic and environmental factors. Natural selection has led to a high tendency of risk alleles to be enriched in minor alleles in Mendelian diseases. Therefore, an allele that was previously advantageous or neutral may later become harmful, making it a risk allele. METHODS: Using data in the NHGRI-EBI Catalog and the VARIMED database, we investigated whether (1) GWASs more easily detect risk alleles and (2) facilitate evolutionary insights by comparing risk allele frequencies of different diseases. We conducted computer simulations of P-values for association tests when major and minor alleles were risk alleles. We compared the expected proportion of SNVs whose risk alleles were minor alleles with the observed proportion. RESULTS: Our statistical results revealed that risk alleles were enriched in minor alleles, especially for variants with low minor allele frequencies (MAFs < 0.1). Our computer simulations revealed that > 50% risk alleles were minor alleles because of the larger difference in the power of GWASs to differentiate between minor and major alleles, especially with low MAFs or when the number of controls exceeds the number of cases. However, the observed ratios between minor and major alleles in low MAFs (< 0.1) were much larger than the expected ratios of GWAS's power imbalance, especially for diseases whose average risk allele frequencies were low, such as myopia, sudden cardiac arrest, and systemic lupus erythematosus. CONCLUSIONS: Minor alleles are more likely to be risk alleles in the published GWASs on complex diseases. One reason is that minor alleles are more easily detected as risk alleles in GWASs. Even when correcting for the GWAS's power imbalance, minor alleles are more likely to be risk alleles, especially in some diseases whose average risk allele frequencies are low. These analyses serve as a starting point for future studies on quantifying the degree of negative natural selection in various complex diseases.
Assuntos
Alelos , Biologia Computacional , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Bases de Dados Genéticas , Evolução Molecular , Humanos , Desequilíbrio de LigaçãoRESUMO
To identify rare mutations and retrospectively estimate the cancer risk of a 45-year old female patient diagnosed with Li-Fraumeni syndrome (LFS), who developed nine primary malignant neoplasms in a period of 38 years, we conducted next-generation sequencing in this patient. Whole-genome and whole-exome sequencing were performed in DNA of whole blood obtained a year prior to the diagnosis of acute myeloid leukemia (AML) and at the time of diagnosis of AML, respectively. We analyzed rare mutations in cancer susceptibility genes using a candidate strategy and estimated cancer risk using the Risk-O-Gram algorithm. We found rare mutations in cancer susceptibility genes associated with an increased hereditary cancer risk in the patient. Notably, the number of mutated genes in p53 signaling pathway was significantly higher than expected (p=0.02). However, the phenotype of multiple malignant neoplasms of the studied patient was unlikely to be caused by accumulation of common cancer risk alleles. In conclusion, we established the mutation profile in a rare case of Li-Fraumeni syndrome, illustrating that the rare mutations rather than the cumulative of common risk alleles leading to an increased cancer risk in the patient.
Assuntos
Predisposição Genética para Doença , Mutação em Linhagem Germinativa/genética , Síndrome de Li-Fraumeni/genética , Exoma/genética , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Síndrome de Li-Fraumeni/patologia , Pessoa de Meia-Idade , Transdução de Sinais , Proteína Supressora de Tumor p53/genéticaRESUMO
Gene expression and disease-associated variants are often used to prioritize candidate genes for target validation. However, the success of these gene features alone or in combination in the discovery of therapeutic targets is uncertain. Here we evaluated the effectiveness of the differential expression (DE), the disease-associated single nucleotide polymorphisms (SNPs) and the combination of the two in recovering and predicting known therapeutic targets across 56 human diseases. We demonstrate that the performance of each feature varies across diseases and generally the features have more recovery power than predictive power. The combination of the two features, however, has significantly higher predictive power than each feature alone. Our study provides a systematic evaluation of two common gene features, DE and SNPs, for prioritization of candidate targets and identified an improved predictive power of coupling these two features.