RESUMO
MOTIVATION: The volume and complexity of biological data increases rapidly. Many clinical professionals and biomedical researchers without a bioinformatics background are generating big '-omics' data, but do not always have the tools to manage, process or publicly share these data. RESULTS: Here we present MOLGENIS Research, an open-source web-application to collect, manage, analyze, visualize and share large and complex biomedical datasets, without the need for advanced bioinformatics skills. AVAILABILITY AND IMPLEMENTATION: MOLGENIS Research is freely available (open source software). It can be installed from source code (see http://github.com/molgenis), downloaded as a precompiled WAR file (for your own server), setup inside a Docker container (see http://molgenis.github.io), or requested as a Software-as-a-Service subscription. For a public demo instance and complete installation instructions see http://molgenis.org/research.
Assuntos
Biologia Computacional , Software , Algoritmos , Genoma , GenômicaRESUMO
BACKGROUND: Inference of cancer-causing genes and their biological functions are crucial but challenging due to the heterogeneity of somatic mutations. The heterogeneity of somatic mutations reveals that only a handful of oncogenes mutate frequently and a number of cancer-causing genes mutate rarely. RESULTS: We develop a Cytoscape app, named ZDOG, for visualization of the extent to which mutated genes may affect cancer pathways using the dominating tree model. The dominator tree model allows us to examine conveniently the positional importance of a gene in cancer signalling pathways. This tool facilitates the identification of mutated "master" regulators even with low mutation frequency in deregulated signalling pathways. CONCLUSIONS: We have presented a model for facilitating the examination of the extent to which mutation in a gene may affect downstream components in a signalling pathway through its positional information. The model is implemented in a user-friendly Cytoscape app which will be freely available upon publication. AVAILABILITY: Together with a user manual, the ZDOG app is freely available at GitHub (https://github.com/rudi2013/ZDOG). It is also available in the Cytoscape app store (http://apps.cytoscape.org/apps/ZDOG) and users can easily install it using the Cytoscape App Manager.
Assuntos
Genes Dominantes , Neoplasias/genética , Interface Usuário-Computador , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Feminino , Humanos , Neoplasias/metabolismo , Neoplasias/patologia , Fosfatidilinositol 3-Quinases/genética , Fosfatidilinositol 3-Quinases/metabolismo , Proteínas Proto-Oncogênicas c-akt/genética , Proteínas Proto-Oncogênicas c-akt/metabolismoRESUMO
OBJECTIVE: Patients with IBD display substantial heterogeneity in clinical characteristics. We hypothesise that individual differences in the complex interaction of the host genome and the gut microbiota can explain the onset and the heterogeneous presentation of IBD. Therefore, we performed a case-control analysis of the gut microbiota, the host genome and the clinical phenotypes of IBD. DESIGN: Stool samples, peripheral blood and extensive phenotype data were collected from 313 patients with IBD and 582 truly healthy controls, selected from a population cohort. The gut microbiota composition was assessed by tag-sequencing the 16S rRNA gene. All participants were genotyped. We composed genetic risk scores from 11 functional genetic variants proven to be associated with IBD in genes that are directly involved in the bacterial handling in the gut: NOD2, CARD9, ATG16L1, IRGM and FUT2. RESULTS: Strikingly, we observed significant alterations of the gut microbiota of healthy individuals with a high genetic risk for IBD: the IBD genetic risk score was significantly associated with a decrease in the genus Roseburia in healthy controls (false discovery rate 0.017). Moreover, disease location was a major determinant of the gut microbiota: the gut microbiota of patients with colonic Crohn's disease (CD) is different from that of patients with ileal CD, with a decrease in alpha diversity associated to ileal disease (p=3.28×10-13). CONCLUSIONS: We show for the first time that genetic risk variants associated with IBD influence the gut microbiota in healthy individuals. Roseburia spp are acetate-to-butyrate converters, and a decrease has already been observed in patients with IBD.
Assuntos
Microbioma Gastrointestinal/genética , Doenças Inflamatórias Intestinais/genética , Doenças Inflamatórias Intestinais/microbiologia , Adulto , Estudos de Casos e Controles , Colite Ulcerativa/genética , Colite Ulcerativa/microbiologia , Colite Ulcerativa/patologia , Doença de Crohn/genética , Doença de Crohn/microbiologia , Doença de Crohn/patologia , Disbiose/complicações , Disbiose/genética , Disbiose/microbiologia , Fezes/microbiologia , Feminino , Predisposição Genética para Doença , Interações Hospedeiro-Patógeno/genética , Humanos , Doenças Inflamatórias Intestinais/patologia , Masculino , Pessoa de Meia-Idade , Medição de Risco/métodos , Índice de Gravidade de DoençaRESUMO
OBJECTIVE: Primary sclerosing cholangitis (PSC) is a genetically complex, inflammatory bile duct disease of largely unknown aetiology often leading to liver transplantation or death. Little is known about the genetic contribution to the severity and progression of PSC. The aim of this study is to identify genetic variants associated with PSC disease progression and development of complications. DESIGN: We collected standardised PSC subphenotypes in a large cohort of 3402 patients with PSC. After quality control, we combined 130 422 single nucleotide polymorphisms of all patients-obtained using the Illumina immunochip-with their disease subphenotypes. Using logistic regression and Cox proportional hazards models, we identified genetic variants associated with binary and time-to-event PSC subphenotypes. RESULTS: We identified genetic variant rs853974 to be associated with liver transplant-free survival (p=6.07×10-9). Kaplan-Meier survival analysis showed a 50.9% (95% CI 41.5% to 59.5%) transplant-free survival for homozygous AA allele carriers of rs853974 compared with 72.8% (95% CI 69.6% to 75.7%) for GG carriers at 10 years after PSC diagnosis. For the candidate gene in the region, RSPO3, we demonstrated expression in key liver-resident effector cells, such as human and murine cholangiocytes and human hepatic stellate cells. CONCLUSION: We present a large international PSC cohort, and report genetic loci associated with PSC disease progression. For liver transplant-free survival, we identified a genome-wide significant signal and demonstrated expression of the candidate gene RSPO3 in key liver-resident effector cells. This warrants further assessments of the role of this potential key PSC modifier gene.
Assuntos
Colangite Esclerosante/genética , Colangite Esclerosante/patologia , Polimorfismo de Nucleotídeo Único/genética , Trombospondinas/genética , Adulto , Colangite Esclerosante/mortalidade , Estudos de Coortes , Progressão da Doença , Feminino , Humanos , Estimativa de Kaplan-Meier , Modelos Logísticos , Masculino , Pessoa de Meia-Idade , Modelos de Riscos ProporcionaisRESUMO
During a meeting of the SYSGENET working group 'Bioinformatics', currently available software tools and databases for systems genetics in mice were reviewed and the needs for future developments discussed. The group evaluated interoperability and performed initial feasibility studies. To aid future compatibility of software and exchange of already developed software modules, a strong recommendation was made by the group to integrate HAPPY and R/qtl analysis toolboxes, GeneNetwork and XGAP database platforms, and TIQS and xQTL processing platforms. R should be used as the principal computer language for QTL data analysis in all platforms and a 'cloud' should be used for software dissemination to the community. Furthermore, the working group recommended that all data models and software source code should be made visible in public repositories to allow a coordinated effort on the use of common data structures and file formats.
Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Algoritmos , Animais , Redes Reguladoras de Genes , Camundongos/genética , Locos de Características Quantitativas , SoftwareRESUMO
We combined large-scale mRNA expression analysis and gene mapping to identify genes and loci that control hematopoietic stem cell (HSC) function. We measured mRNA expression levels in purified HSCs isolated from a panel of densely genotyped recombinant inbred mouse strains. We mapped quantitative trait loci (QTLs) associated with variation in expression of thousands of transcripts. By comparing the physical transcript position with the location of the controlling QTL, we identified polymorphic cis-acting stem cell genes. We also identified multiple trans-acting control loci that modify expression of large numbers of genes. These groups of coregulated transcripts identify pathways that specify variation in stem cells. We illustrate this concept with the identification of candidate genes involved with HSC turnover. We compared expression QTLs in HSCs and brain from the same mice and identified both shared and tissue-specific QTLs. Our data are accessible through WebQTL, a web-based interface that allows custom genetic linkage analysis and identification of coregulated transcripts.
Assuntos
Genoma Humano , Células-Tronco Hematopoéticas/citologia , Proteínas de Transporte/genética , Humanos , Dados de Sequência Molecular , Locos de Características Quantitativas , RNA Mensageiro/genéticaRESUMO
Viruses are a key component of the colon microbiome, but the relationship between virome and colorectal cancer (CRC) remains poorly understood. We seek to identify alterations in the viral community that is characteristic of CRC and examine if they persist after surgery. Forty-nine fecal samples from 25 non-cancer (NC) individuals and 12 CRC patients, before and 6-months after surgery, were collected for metagenomic analysis. The fecal virome of CRC patients demonstrated an increased network connectivity as compared to NC individuals. Co-exclusion of influential viruses to bacterial species associated with healthy gut status was observed in CRC, suggesting an altered virome induced a change in the healthy gut bacteriome. Network analysis revealed lower connectivity within the virome and trans-kingdom interactions in NC. After surgery, the number of strong correlations decreased for trans-kingdom and within the bacteria and virome networks, indicating lower connectivity within the microbiome. Some co-occurrence patterns between dominant viruses and bacteria were also lost after surgery, suggesting a possible return to the healthy state of gut microbiome. Microbial signatures characteristic of CRC include an altered virome besides an altered bacterial composition. Elevated viral correlations and network connectivity were observed in CRC patients relative to healthy individuals, alongside distinct changes in the cross-kingdom correlation network unique to CRC patients. Some patterns of dysbiosis persist after surgery. Future studies should seek to verify if dysbiosis truly persists after surgery in a larger sample size with microbiome data collected at various time points after surgery to explore if there is field-change in the remaining colon, as well as to examine if persistent dysbiosis correlates with patient outcomes.
Assuntos
Neoplasias Colorretais , Microbiota , Vírus , Humanos , Viroma , Disbiose/microbiologia , Neoplasias Colorretais/cirurgia , Neoplasias Colorretais/microbiologiaRESUMO
BACKGROUND: Obesity-associated organ-specific pathological states can be ensued from the dysregulation of the functions of the adipose tissues, liver and muscle. However, the influence of genetic differences underlying gross-compositional differences in these tissues is largely unknown. In the present study, the analytical method of ATR-FTIR spectroscopy has been combined with a genetic approach to identify genetic differences responsible for phenotypic alterations in adipose, liver and muscle tissues. RESULTS: Mice from 29 BXD recombinant inbred mouse strains were put on high fat diet and gross-compositional changes in adipose, liver and muscle tissues were measured by ATR-FTIR spectroscopy. The analysis of genotype-phenotype correlations revealed significant quantitative trait loci (QTL) on chromosome 12 for the content of fat and collagen, collagen integrity, and the lipid to protein ratio in adipose tissue and on chromosome 17 for lipid to protein ratio in liver. Using gene expression and sequence information, we suggest Rsad2 (viperin) and Colec11 (collectin-11) on chromosome 12 as potential quantitative trait candidate genes. Rsad2 may act as a modulator of lipid droplet contents and lipid biosynthesis; Colec11 might play a role in apoptopic cell clearance and maintenance of adipose tissue. An increased level of Rsad2 transcripts in adipose tissue of DBA/2J compared to C57BL/6J mice suggests a cis-acting genetic variant leading to differential gene activation. CONCLUSION: The results demonstrate that the analytical method of ATR-FTIR spectroscopy effectively contributed to decompose the macromolecular composition of tissues that accumulate fat and to link this information with genetic determinants. The candidate genes in the QTL regions may contribute to obesity-related diseases in humans, in particular if the results can be verified in a bigger BXD cohort.
Assuntos
DNA Recombinante/genética , Dieta Hiperlipídica/efeitos adversos , Genômica , Endogamia , Locos de Características Quantitativas/genética , Animais , Masculino , Camundongos , Fenótipo , Espectroscopia de Infravermelho com Transformada de FourierRESUMO
BACKGROUND: There is strong but mostly circumstantial evidence that genetic factors modulate the severity of influenza infection in humans. Using genetically diverse but fully inbred strains of mice it has been shown that host sequence variants have a strong influence on the severity of influenza A disease progression. In particular, C57BL/6J, the most widely used mouse strain in biomedical research, is comparatively resistant. In contrast, DBA/2J is highly susceptible. RESULTS: To map regions of the genome responsible for differences in influenza susceptibility, we infected a family of 53 BXD-type lines derived from a cross between C57BL/6J and DBA/2J strains with influenza A virus (PR8, H1N1). We monitored body weight, survival, and mean time to death for 13 days after infection. Qivr5 (quantitative trait for influenza virus resistance on chromosome 5) was the largest and most significant QTL for weight loss. The effect of Qivr5 was detectable on day 2 post infection, but was most pronounced on days 5 and 6. Survival rate mapped to Qivr5, but additionally revealed a second significant locus on chromosome 19 (Qivr19). Analysis of mean time to death affirmed both Qivr5 and Qivr19. In addition, we observed several regions of the genome with suggestive linkage. There are potentially complex combinatorial interactions of the parental alleles among loci. Analysis of multiple gene expression data sets and sequence variants in these strains highlights about 30 strong candidate genes across all loci that may control influenza A susceptibility and resistance. CONCLUSIONS: We have mapped influenza susceptibility loci to chromosomes 2, 5, 16, 17, and 19. Body weight and survival loci have a time-dependent profile that presumably reflects the temporal dynamic of the response to infection. We highlight candidate genes in the respective intervals and review their possible biological function during infection.
Assuntos
Resistência à Doença/genética , Vírus da Influenza A Subtipo H1N1/patogenicidade , Infecções por Orthomyxoviridae/genética , Locos de Características Quantitativas , Alelos , Animais , Peso Corporal , Mapeamento Cromossômico , Interações Hospedeiro-Patógeno , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos DBA , Infecções por Orthomyxoviridae/virologia , Fatores de TempoRESUMO
Rift Valley fever (RVF) is an arthropod-borne viral disease repeatedly reported in many African countries and, more recently, in Saudi Arabia and Yemen. RVF virus (RVFV) primarily infects domesticated ruminants, resulting in miscarriage in pregnant females and death for newborns and young animals. It also has the ability to infect humans, causing a feverish syndrome, meningoencephalitis, or hemorrhagic fever. The various outcomes of RVFV infection in animals and humans argue for the existence of host genetic determinants controlling the disease. We investigated the susceptibility of inbred mouse strains to infection with the virulent RVFV ZH548 strain. Compared with classical BALB/cByJ mice, wild-derived Mus m. musculus MBT/Pas mice exhibited earlier and greater viremia and died sooner, a result in sharp contrast with their resistance to infection with West Nile virus and influenza A. Infection of mouse embryonic fibroblasts (MEFs) from MBT/Pas mice with RVFV also resulted in higher viral production. Microarray and quantitative RT-PCR experiments showed that BALB/cByJ MEFs displayed a significant activation of the type I IFN pathway. In contrast, MBT/Pas MEFs elicited a delayed and partial type I IFN response to RVFV infection. RNA interference-mediated inhibition of genes that were not induced by RVFV in MBT/Pas MEFs increased viral production in BALB/cByJ MEFs, thus demonstrating their functional importance in limiting viral replication. We conclude that the failure of MBT/Pas murine strain to induce, in due course, a complete innate immune response is instrumental in the selective susceptibility to RVF.
Assuntos
Imunidade Inata/genética , Febre do Vale de Rift/genética , Febre do Vale de Rift/imunologia , Animais , Modelos Animais de Doenças , Fibroblastos/imunologia , Fibroblastos/virologia , Perfilação da Expressão Gênica , Predisposição Genética para Doença , Imuno-Histoquímica , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase Via Transcriptase ReversaRESUMO
Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2), has spread over the world causing a pandemic which is still ongoing since its emergence in late 2019. A great amount of effort has been devoted to understanding the pathogenesis of COVID-19 with the hope of developing better therapeutic strategies. Transcriptome analysis using technologies such as RNA sequencing became a commonly used approach in study of host immune responses to SARS-CoV-2. Although substantial amount of information can be gathered from transcriptome analysis, different analysis tools used in these studies may lead to conclusions that differ dramatically from each other. Here, we re-analyzed four RNA-sequencing datasets of COVID-19 samples including human bronchoalveolar lavage fluid, nasopharyngeal swabs, lung biopsy and hACE2 transgenic mice using the same standardized method. The results showed that common features of COVID-19 include upregulation of chemokines including CCL2, CXCL1, and CXCL10, inflammatory cytokine IL-1ß and alarmin S100A8/S100A9, which are associated with dysregulated innate immunity marked by abundant neutrophil and mast cell accumulation. Downregulation of chemokine receptor genes that are associated with impaired adaptive immunity such as lymphopenia is another common feather of COVID-19 observed. In addition, a few interferon-stimulated genes but no type I IFN genes were identified to be enriched in COVID-19 samples compared to their respective control in these datasets. These features are in line with results from single-cell RNA sequencing studies in the field. Therefore, our re-analysis of the RNA-seq datasets revealed common features of dysregulated immune responses to SARS-CoV-2 and shed light to the pathogenesis of COVID-19.
RESUMO
Background: High emotional or psychophysical stress levels have been correlated with an increased risk and progression of various diseases. How stress impacts the gut microbiota to influence metabolism and subsequent cancer progression is unclear. Methods: Feces and serum samples from BALB/c ANXA1+/+ and ANXA1-/- mice with or without chronic restraint stress were used for 16S rRNA gene sequencing and GC-MS metabolomics analysis to investigate the effect of stress on microbiome and metabolomics during stress and breast tumorigenesis. Breast tumors samples from stressed and non-stressed mice were used to perform Whole-Genome Bisulfite Sequencing (WGBS) and RNAseq analysis to construct the potential network from candidate hub genes. Finally, machine learning and integrated analysis were used to map the axis from chronic restraint stress to breast cancer development. Results: We report that chronic stress promotes breast tumor growth via a stress-microbiome-metabolite-epigenetic-oncology (SMMEO) axis. Chronic restraint stress in mice alters the microbiome composition and fatty acids metabolism and induces an epigenetic signature in tumors xenografted after stress. Subsequent machine learning and systemic modeling analyses identified a significant correlation among microbiome composition, metabolites, and differentially methylated regions in stressed tumors. Moreover, silencing Annexin-A1 inhibits the changes in the gut microbiome and fatty acid metabolism after stress as well as basal and stress-induced tumor growth. Conclusions: These data support a physiological axis linking the microbiome and metabolites to cancer epigenetics and inflammation. The identification of this axis could propel the next phase of experimental discovery in further understanding the underlying molecular mechanism of tumorigenesis caused by physiological stress.
Assuntos
Anexina A1 , Microbiota , Neoplasias , Animais , Carcinogênese/genética , Epigênese Genética , Ácidos Graxos/farmacologia , Metaboloma , Metabolômica , Camundongos , Neoplasias/genética , RNA Ribossômico 16S/genéticaRESUMO
BACKGROUND: Regulatory T cells (Tregs) play an essential role in the control of the immune response. Treg cells represent important targets for therapeutic interventions of the immune system. Therefore, it will be very important to understand in more detail which genes are specifically activated in Treg cells versus T helper (Th) cells, and which gene regulatory circuits may be involved in specifying and maintaining Treg cell homeostasis. RESULTS: We isolated Treg and Th cells from a genetically diverse family of 31 BXD type recombinant inbred strains and the fully inbred parental strains of this family--C57BL/6J and DBA/2J. Subsequently genome-wide gene expression studies were performed from the isolated Treg and Th cells. A comparative analysis of the transcriptomes of these cell populations allowed us to identify many novel differentially expressed genes. Analysis of cis- and trans-expression Quantitative Trait Loci (eQTLs) highlighted common and unique regulatory mechanisms that are active in the two cell types. Trans-eQTL regions were found for the Treg functional genes Nrp1, Stat3 and Ikzf4. Analyses of the respective QTL intervals suggested several candidate genes that may be involved in regulating these genes in Treg cells. Similarly, possible candidate genes were found which may regulate the expression of F2rl1, Ctla4, Klrb1f. In addition, we identified a focused group of candidate genes that may be important for the maintenance of self-tolerance and the prevention of allergy. CONCLUSIONS: Variation of expression across the strains allowed us to find many novel gene-interaction networks in both T cell subsets. In addition, these two data sets enabled us to identify many differentially expressed genes and to nominate candidate genes that may have important functions for the maintenance of self-tolerance and the prevention of allergy.
Assuntos
Doenças Autoimunes/genética , Locos de Características Quantitativas , Linfócitos T Auxiliares-Indutores/metabolismo , Linfócitos T Reguladores/metabolismo , Animais , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos DBARESUMO
BACKGROUND: The lung is critical in surveillance and initial defense against pathogens. In humans, as in mice, individual genetic differences strongly modulate pulmonary responses to infectious agents, severity of lung disease, and potential allergic reactions. In a first step towards understanding genetic predisposition and pulmonary molecular networks that underlie individual differences in disease vulnerability, we performed a global analysis of normative lung gene expression levels in inbred mouse strains and a large family of BXD strains that are widely used for systems genetics. Our goal is to provide a key community resource on the genetics of the normative lung transcriptome that can serve as a foundation for experimental analysis and allow predicting genetic predisposition and response to pathogens, allergens, and xenobiotics. METHODS: Steady-state polyA+ mRNA levels were assayed across a diverse and fully genotyped panel of 57 isogenic strains using the Affymetrix M430 2.0 array. Correlations of expression levels between genes were determined. Global expression QTL (eQTL) analysis and network covariance analysis was performed using tools and resources in GeneNetwork http://www.genenetwork.org. RESULTS: Expression values were highly variable across strains and in many cases exhibited a high heritability factor. Several genes which showed a restricted expression to lung tissue were identified. Using correlations between gene expression values across all strains, we defined and extended memberships of several important molecular networks in the lung. Furthermore, we were able to extract signatures of immune cell subpopulations and characterize co-variation and shared genetic modulation. Known QTL regions for respiratory infection susceptibility were investigated and several cis-eQTL genes were identified. Numerous cis- and trans-regulated transcripts and chromosomal intervals with strong regulatory activity were mapped. The Cyp1a1 P450 transcript had a strong trans-acting eQTL (LOD 11.8) on Chr 12 at 36 ± 1 Mb. This interval contains the transcription factor Ahr that has a critical mis-sense allele in the DBA/2J haplotype and evidently modulates transcriptional activation by AhR. CONCLUSIONS: Large-scale gene expression analyses in genetic reference populations revealed lung-specific and immune-cell gene expression profiles and suggested specific gene regulatory interactions.
Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Pulmão/química , RNA Mensageiro/análise , Animais , Linfócitos B/química , Linfócitos B/imunologia , Feminino , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Predisposição Genética para Doença , Hereditariedade , Escore Lod , Pulmão/citologia , Pulmão/imunologia , Masculino , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos DBA , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Característica Quantitativa Herdável , Especificidade da Espécie , Linfócitos T/química , Linfócitos T/imunologiaRESUMO
BACKGROUND: Quantitative trait locus (QTL) mapping identifies genomic regions that likely contain genes regulating a quantitative trait. However, QTL regions may encompass tens to hundreds of genes. To find the most promising candidate genes that regulate the trait, the biologist typically collects information from multiple resources about the genes in the QTL interval. This process is very laborious and time consuming. RESULTS: QTLminer is a bioinformatics tool that automatically performs QTL region analysis. It is available in GeneNetwork and it integrates information such as gene annotation, gene expression and sequence polymorphisms for all the genes within a given genomic interval. CONCLUSIONS: QTLminer substantially speeds up discovery of the most promising candidate genes within a QTL region.
Assuntos
Genoma , Locos de Características Quantitativas/genética , Software , Biologia Computacional , Anotação de Sequência Molecular/métodos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: The analysis of expression quantitative trait loci (eQTL) is a potentially powerful way to detect transcriptional regulatory relationships at the genomic scale. However, eQTL data sets often go underexploited because legacy QTL methods are used to map the relationship between the expression trait and genotype. Often these methods are inappropriate for complex traits such as gene expression, particularly in the case of epistasis. RESULTS: Here we compare legacy QTL mapping methods with several modern multi-locus methods and evaluate their ability to produce eQTL that agree with independent external data in a systematic way. We found that the modern multi-locus methods (Random Forests, sparse partial least squares, lasso, and elastic net) clearly outperformed the legacy QTL methods (Haley-Knott regression and composite interval mapping) in terms of biological relevance of the mapped eQTL. In particular, we found that our new approach, based on Random Forests, showed superior performance among the multi-locus methods. CONCLUSIONS: Benchmarks based on the recapitulation of experimental findings provide valuable insight when selecting the appropriate eQTL mapping method. Our battery of tests suggests that Random Forests map eQTL that are more likely to be validated by independent data, when compared to competing multi-locus and legacy eQTL mapping methods.
Assuntos
Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Locos de Características Quantitativas/genética , Animais , Viés , Simulação por Computador , Regulação da Expressão Gênica , Células-Tronco Hematopoéticas/metabolismo , Hipocampo/metabolismo , Pulmão/metabolismo , Camundongos , Modelos Genéticos , Mutação/genética , Polimorfismo de Nucleotídeo Único/genética , Característica Quantitativa Herdável , Tamanho da Amostra , Linfócitos T Reguladores/metabolismoRESUMO
MOTIVATION: Affymetrix arrays use multiple probes per gene to measure mRNA abundances. Standard software takes averages over probes. Important information may be lost if polymorphisms in the mRNA affect the hybridization of individual probes. RESULTS: We present custom software to analyze genetical genomics experiments in human, mouse and other organisms: (i) an R package providing functions for QTL analysis at the individual probe level and (ii) Perl scripts providing custom tracks in the UCSC Genome Browser to check for sequence polymorphisms in probe regions. AVAILABILITY: http://gbic.biol.rug.nl/supplementary.
Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Sondas de DNA/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Locos de Características Quantitativas/genética , Software , Animais , Humanos , CamundongosRESUMO
Background and Aims: Crohn's disease [CD] is a chronic inflammatory disease with unpredictable behaviour. More than half of CD patients eventually develop complications such as stenosis, for which they then require endoscopic dilatation or surgery, as no anti-fibrotic drugs are currently available. We aim to identify disease-modifying genes associated with fibrostenotic CD. Methods: We performed a within-case analysis comparing 'extreme phenotypes' using the Immunochip and replication of the top single nucleotide polymorphisms [SNPs] with Agena Bioscience in two independent case-control cohorts totalling 322 cases with fibrostenotis [recurrent after surgery] and 619 cases with purely inflammatory CD. Results: Combined meta-analysis resulted in a genome-wide significant signal for SNP rs11861007 [p = 6.0910-11], located on chromosome 16, in lncRNA RP11-679B19.1, an lncRNA of unknown function, and close to exon 9 of the WWOX gene, which codes for WW domain-containing oxidoreductase. We analysed mRNA expression of TGF-ß and downstream genes in ileocecal resection material from ten patients with and without the WWOX risk allele. Patients carrying the risk allele [A] showed enhanced colonic expression of TGF-ß compared to patients homozygous for the wild-type [G] allele [p = 0.0079]. Conclusion: We have identified a variant in WWOX and in lncRNA RP11-679B19.1 as a disease-modifying genetic variant associated with recurrent fibrostenotic CD and replicated this association in an independent cohort. WWOX can potentially play a crucial role in fibrostenosis in CD, being positioned at the crossroads of inflammation and fibrosis.
Assuntos
Doença de Crohn/genética , Doença de Crohn/metabolismo , RNA Mensageiro/metabolismo , Proteínas Supressoras de Tumor/genética , Oxidorredutase com Domínios WW/genética , Adolescente , Adulto , Alelos , Estudos de Casos e Controles , Constrição Patológica/etiologia , Doença de Crohn/complicações , Feminino , Fibrose , Estudo de Associação Genômica Ampla , Genômica , Humanos , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único , RNA Longo não Codificante/genética , Fator de Crescimento Transformador beta/genética , Adulto JovemRESUMO
Changes in the gut microbiota have been associated with two of the most common gastrointestinal diseases, inflammatory bowel disease (IBD) and irritable bowel syndrome (IBS). Here, we performed a case-control analysis using shotgun metagenomic sequencing of stool samples from 1792 individuals with IBD and IBS compared with control individuals in the general population. Despite substantial overlap between the gut microbiome of patients with IBD and IBS compared with control individuals, we were able to use gut microbiota composition differences to distinguish patients with IBD from those with IBS. By combining species-level profiles and strain-level profiles with bacterial growth rates, metabolic functions, antibiotic resistance, and virulence factor analyses, we identified key bacterial species that may be involved in two common gastrointestinal diseases.
Assuntos
Microbioma Gastrointestinal , Doenças Inflamatórias Intestinais/microbiologia , Síndrome do Intestino Irritável/microbiologia , Bactérias/crescimento & desenvolvimento , Bactérias/patogenicidade , Biodiversidade , Estudos de Casos e Controles , Resistência Microbiana a Medicamentos , Fezes/microbiologia , Microbioma Gastrointestinal/genética , Humanos , Metagenoma , Modelos Biológicos , Fenótipo , Análise de Componente Principal , Curva ROC , Especificidade da Espécie , VirulênciaRESUMO
BACKGROUND: The Affymetrix GeneChip technology uses multiple probes per gene to measure its expression level. Individual probe signals can vary widely, which hampers proper interpretation. This variation can be caused by probes that do not properly match their target gene or that match multiple genes. To determine the accuracy of Affymetrix arrays, we developed an extensive verification protocol, for mouse arrays incorporating the NCBI RefSeq, NCBI UniGene Unique, NIA Mouse Gene Index, and UCSC mouse genome databases. RESULTS: Applying this protocol to Affymetrix Mouse Genome arrays (the earlier U74Av2 and the newer 430 2.0 array), the number of sequence-verified probes with perfect matches was no less than 85% and 95%, respectively; and for 74% and 85% of the probe sets all probes were sequence verified. The latter percentages increased to 80% and 94% after discarding one or two unverifiable probes per probe set, and even further to 84% and 97% when, in addition, allowing for one or two mismatches between probe and target gene. Similar results were obtained for other mouse arrays, as well as for human and rat arrays. Based on these data, refined chip definition files for all arrays are provided online. Researchers can choose the version appropriate for their study to (re)analyze expression data. CONCLUSION: The accuracy of Affymetrix probe sequences is higher than previously reported, particularly on newer arrays. Yet, refined probe set definitions have clear effects on the detection of differentially expressed genes. We demonstrate that the interpretation of the results of Affymetrix arrays is improved when the new chip definition files are used.