RESUMO
DNA sequence variation causes changes in gene expression, which in turn has profound effects on cellular states. These variations affect tissue development and may ultimately lead to pathological phenotypes. A genetic locus containing a sequence variation that affects gene expression is called an "expression quantitative trait locus" (eQTL). Whereas the impact of cellular context on expression levels in general is well established, a lot less is known about the cell-state specificity of eQTL. Previous studies differed with respect to how "dynamic eQTL" were defined. Here, we propose a unified framework distinguishing static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. Further, we introduce a new approach to simultaneously infer eQTL from different cell types. By using murine mRNA expression data from four stages of hematopoiesis and 14 related cellular traits, we demonstrate that static, conditional and dynamic eQTL, although derived from the same expression data, represent functionally distinct types of eQTL. While static eQTL affect generic cellular processes, non-static eQTL are more often involved in hematopoiesis and immune response. Our analysis revealed substantial effects of individual genetic variation on cell type-specific expression regulation. Among a total number of 3,941 eQTL we detected 2,729 static eQTL, 1,187 eQTL were conditionally active in one or several cell types, and 70 eQTL affected expression changes during cell type transitions. We also found evidence for feedback control mechanisms reverting the effect of an eQTL specifically in certain cell types. Loci correlated with hematological traits were enriched for conditional eQTL, thus, demonstrating the importance of conditional eQTL for understanding molecular mechanisms underlying physiological trait variation. The classification proposed here has the potential to streamline and unify future analysis of conditional and dynamic eQTL as well as many other kinds of QTL data.
Assuntos
Diferenciação Celular/genética , Mapeamento Cromossômico , Especificidade de Órgãos/genética , Locos de Características Quantitativas/genética , Animais , Linhagem da Célula , Regulação da Expressão Gênica no Desenvolvimento , Variação Genética , Células-Tronco Hematopoéticas/metabolismo , Camundongos , Modelos TeóricosRESUMO
Epistatic genetic interactions are key for understanding the genetic contribution to complex traits. Epistasis is always defined with respect to some trait such as growth rate or fitness. Whereas most existing epistasis screens explicitly test for a trait, it is also possible to implicitly test for fitness traits by searching for the over- or under-representation of allele pairs in a given population. Such analysis of imbalanced allele pair frequencies of distant loci has not been exploited yet on a genome-wide scale, mostly due to statistical difficulties such as the multiple testing problem. We propose a new approach called Imbalanced Allele Pair frequencies (ImAP) for inferring epistatic interactions that is exclusively based on DNA sequence information. Our approach is based on genome-wide SNP data sampled from a population with known family structure. We make use of genotype information of parent-child trios and inspect 3×3 contingency tables for detecting pairs of alleles from different genomic positions that are over- or under-represented in the population. We also developed a simulation setup which mimics the pedigree structure by simultaneously assuming independence of the markers. When applied to mouse SNP data, our method detected 168 imbalanced allele pairs, which is substantially more than in simulations assuming no interactions. We could validate a significant number of the interactions with external data, and we found that interacting loci are enriched for genes involved in developmental processes.
Assuntos
Epistasia Genética , Frequência do Gene/genética , Desequilíbrio de Ligação/genética , População/genética , Animais , Simulação por Computador , Genoma , Genótipo , Haplótipos , Heterozigoto , Humanos , Camundongos , Modelos Genéticos , Linhagem , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Chromatin immunoprecipitation coupled with deep sequencing (ChIP-seq) has great potential for elucidating transcriptional networks, by measuring genome-wide binding of transcription factors (TFs) at high resolution. Despite the precision of these experiments, identification of genes directly regulated by a TF (target genes) is not trivial. Numerous target gene scoring methods have been used in the past. However, their suitability for the task and their performance remain unclear, because a thorough comparative assessment of these methods is still lacking. Here we present a systematic evaluation of computational methods for defining TF targets based on ChIP-seq data. We validated predictions based on 68 ChIP-seq studies using a wide range of genomic expression data and functional information. We demonstrate that peak-to-gene assignment is the most crucial step for correct target gene prediction and propose a parameter-free method performing most consistently across the evaluation tests.
Assuntos
Sítios de Ligação , Imunoprecipitação da Cromatina/métodos , Genômica/métodos , Fatores de Transcrição/química , Fatores de Transcrição/genética , Algoritmos , Animais , Bases de Dados Genéticas , Genoma , Camundongos , Modelos Estatísticos , Reprodutibilidade dos Testes , Análise de Sequência de DNARESUMO
BACKGROUND: Analysis of microarray and other high-throughput data on the basis of gene sets, rather than individual genes, is becoming more important in genomic studies. Correspondingly, a large number of statistical approaches for detecting gene set enrichment have been proposed, but both the interrelations and the relative performance of the various methods are still very much unclear. RESULTS: We conduct an extensive survey of statistical approaches for gene set analysis and identify a common modular structure underlying most published methods. Based on this finding we propose a general framework for detecting gene set enrichment. This framework provides a meta-theory of gene set analysis that not only helps to gain a better understanding of the relative merits of each embedded approach but also facilitates a principled comparison and offers insights into the relative interplay of the methods. CONCLUSION: We use this framework to conduct a computer simulation comparing 261 different variants of gene set enrichment procedures and to analyze two experimental data sets. Based on the results we offer recommendations for best practices regarding the choice of effective procedures for gene set enrichment analysis.
Assuntos
Simulação por Computador , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Animais , Bases de Dados Genéticas , Humanos , Modelos EstatísticosRESUMO
Neglected infectious diseases (NIDs) are a persistent cause of death and disability in low-income countries. Currently available drugs and vaccines are often ineffective, costly or associated with severe side-effects. Although the scale of research on NIDs does not reflect their disease burden, there are encouraging signs that NIDs have begun to attract more political and public attention, which have translated into greater awareness and increased investments in NID research by both public and private donors. Using publicly available data, we analysed funding for NID research in the European Union's (EU's) 7th Framework Programme for Research and Technological Development (FP7), which ran from 2007 to 2013. During FP7, the EU provided 169 million for 65 NID research projects, and thereby placed itself among the top global funders of NID research. Average annual FP7 investment in NID research exceeded 24 million, triple that committed by the EU before the launch of FP7. FP7 NID projects involved research teams from 331 different institutions in 72 countries on six continents, underlining the increasingly global nature of European research activities. NID research has remained a priority in the current EU Framework Programme for research and innovation, Horizon 2020, launched in 2014. This has most notably been reflected in the second programme of the European & Developing Countries Clinical Trials Partnership (EDCTP), which provides unprecedented opportunities to advance the clinical development of new medical interventions against NIDs. Europe is thus better positioned than ever before to play a major role in the global fight against NIDs.
RESUMO
Expression quantitative trait loci (eQTL) mapping is a widely used technique to uncover regulatory relationships between genes. A range of methodologies have been developed to map links between expression traits and genotypes. The DREAM (Dialogue on Reverse Engineering Assessments and Methods) initiative is a community project to objectively assess the relative performance of different computational approaches for solving specific systems biology problems. The goal of one of the DREAM5 challenges was to reverse-engineer genetic interaction networks from synthetic genetic variation and gene expression data, which simulates the problem of eQTL mapping. In this framework, we proposed an approach whose originality resides in the use of a combination of existing machine learning algorithms (committee). Although it was not the best performer, this method was by far the most precise on average. After the competition, we continued in this direction by evaluating other committees using the DREAM5 data and developed a method that relies on Random Forests and LASSO. It achieved a much higher average precision than the DREAM best performer at the cost of slightly lower average sensitivity.
Assuntos
Inteligência Artificial , Mapeamento Cromossômico , Redes Reguladoras de Genes , Locos de Características Quantitativas , Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Genótipo , Curva ROCRESUMO
During the last years gene interaction networks are increasingly being used for the assessment and interpretation of biological measurements. Knowledge of the interaction partners of an unknown protein allows scientists to understand the complex relationships between genetic products, helps to reveal unknown biological functions and pathways, and get a more detailed picture of an organism's complexity. Being able to measure all protein interactions under all relevant conditions is virtually impossible. Hence, computational methods integrating different datasets for predicting gene interactions are needed. However, when integrating different sources one has to account for the fact that some parts of the information may be redundant, which may lead to an overestimation of the true likelihood of an interaction. Our method integrates information derived from three different databases (Bioverse, HiMAP and STRING) for predicting human gene interactions. A Bayesian approach was implemented in order to integrate the different data sources on a common quantitative scale. An important assumption of the Bayesian integration is independence of the input data (features). Our study shows that the conditional dependency cannot be ignored when combining gene interaction databases that rely on partially overlapping input data. In addition, we show how the correlation structure between the databases can be detected and we propose a linear model to correct for this bias. Benchmarking the results against two independent reference data sets shows that the integrated model outperforms the individual datasets. Our method provides an intuitive strategy for weighting the different features while accounting for their conditional dependencies.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Redes Reguladoras de Genes , Genoma Humano , Mapeamento de Interação de Proteínas/métodos , Teorema de Bayes , Benchmarking , Humanos , Funções Verossimilhança , Modelos Genéticos , Modelos Estatísticos , Proteínas/química , Reprodutibilidade dos Testes , Alinhamento de SequênciaRESUMO
Relying on the high affinities of the benz-indolo-azecine LE 300 (1) and the hydroxylated dibenz-azecine LE 404 (2b) for the D1/D5 receptor subtypes, we synthesized methoxylated, hydroxylated and an indole-N methylated derivatives of 1 (Fig. 1). Hydroxylation of azecine derivatives is beneficial with regard to the affinities and selectivities for all the dopamine receptor subtypes. The 'serotonin-derived' 3-oxygenated target compounds but not the 11-oxygenated analogues were superior to the unsubstituted LE 300. 11-Methoxy-7,14-dimethyl-6,7,8,9,14,15-hexahydro-5H-indolo[3,2-f][3]benzazecine (3e) was found to be the most potent antagonist at D2/D3/D4 and D5 receptor subtypes (Ki for D5 = 0.23 nmol) of all known benz-indolo-azecines.