ABSTRACT
To address the challenge of translating genetic discoveries for type 1 diabetes (T1D) into mechanistic insight, we have developed the T1D Knowledge Portal (T1DKP), an open-access resource for hypothesis development and target discovery in T1D.
Subject(s)
Diabetes Mellitus, Type 1 , Humans , Diabetes Mellitus, Type 1/genetics , Genomics , Human GeneticsABSTRACT
Drug development and biological discovery require effective strategies to map existing genetic associations to causal genes. To approach this problem, we selected 12 common diseases and quantitative traits for which highly powered genome-wide association studies (GWAS) were available. For each disease or trait, we systematically curated positive control gene sets from Mendelian forms of the disease and from targets of medicines used for disease treatment. We found that these positive control genes were highly enriched in proximity of GWAS-associated single-nucleotide variants (SNVs). We then performed quantitative assessment of the contribution of commonly used genomic features, including open chromatin maps, expression quantitative trait loci (eQTL), and chromatin conformation data. Using these features, we trained and validated an Effector Index (Ei), to map target genes for these 12 common diseases and traits. Ei demonstrated high predictive performance, both with cross-validation on the training set, and an independently derived set for type 2 diabetes. Key predictive features included coding or transcript-altering SNVs, distance to gene, and open chromatin-based metrics. This work outlines a simple, understandable approach to prioritize genes at GWAS loci for functional follow-up and drug development, and provides a systematic strategy for prioritization of GWAS target genes.
Subject(s)
Diabetes Mellitus, Type 2 , Genome-Wide Association Study , Chromatin/genetics , Diabetes Mellitus, Type 2/genetics , Genetic Predisposition to Disease , Humans , Polymorphism, Single Nucleotide , Quantitative Trait LociABSTRACT
Translating genetic discoveries for type 1 diabetes (T1D) into mechanistic insight can reveal novel biology and therapeutic targets but remains a major challenge. We developed the T1D Knowledge Portal (T1DKP), a disease-specific resource of genetic and functional annotation data that enables users to develop hypotheses for T1D-based research and target discovery. The T1DKP can be used to query genes and genomic regions for genetic associations, identify epigenomic features, access results of bioinformatic analyses, and obtain expert-curated resources. The T1DKP is available at http://t1d.hugeamp.org .
ABSTRACT
We have experimentally and computationally defined a set of genes that form a conserved metabolic module in the alpha-proteobacterium Caulobacter crescentus and used this module to illustrate a schema for the propagation of pathway-level annotation across bacterial genera. Applying comprehensive forward and reverse genetic methods and genome-wide transcriptional analysis, we (1) confirmed the presence of genes involved in catabolism of the abundant environmental sugar myo-inositol, (2) defined an operon encoding an ABC-family myo-inositol transmembrane transporter, and (3) identified a novel myo-inositol regulator protein and cis-acting regulatory motif that control expression of genes in this metabolic module. Despite being encoded from non-contiguous loci on the C. crescentus chromosome, these myo-inositol catabolic enzymes and transporter proteins form a tightly linked functional group in a computationally inferred network of protein associations. Primary sequence comparison was not sufficient to confidently extend annotation of all components of this novel metabolic module to related bacterial genera. Consequently, we implemented the Graemlin multiple-network alignment algorithm to generate cross-species predictions of genes involved in myo-inositol transport and catabolism in other alpha-proteobacteria. Although the chromosomal organization of genes in this functional module varied between species, the upstream regions of genes in this aligned network were enriched for the same palindromic cis-regulatory motif identified experimentally in C. crescentus. Transposon disruption of the operon encoding the computationally predicted ABC myo-inositol transporter of Sinorhizobium meliloti abolished growth on myo-inositol as the sole carbon source, confirming our cross-genera functional prediction. Thus, we have defined regulatory, transport, and catabolic genes and a cis-acting regulatory sequence that form a conserved module required for myo-inositol metabolism in select alpha-proteobacteria. Moreover, this study describes a forward validation of gene-network alignment, and illustrates a strategy for reliably transferring pathway-level annotation across bacterial species.
Subject(s)
Bacterial Proteins/metabolism , Caulobacter crescentus/metabolism , Conserved Sequence , Alphaproteobacteria/genetics , Alphaproteobacteria/metabolism , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Base Sequence , Binding Sites , Caulobacter crescentus/chemistry , Caulobacter crescentus/genetics , Computational Biology , Gene Expression Regulation, Bacterial , Gene Regulatory Networks , Genome, Bacterial , Inositol/metabolism , Molecular Sequence Data , Mutagenesis, Insertional , OperonABSTRACT
The collection of multiple genome-scale datasets is now routine, and the frontier of research in systems biology has shifted accordingly. Rather than clustering a single dataset to produce a static map of functional modules, the focus today is on data integration, network alignment, interactive visualization and ontological markup. Because of the intrinsic noisiness of high-throughput measurements, statistical methods have been central to this effort. In this review, we briefly survey available datasets in functional genomics, review methods for data integration and network alignment, and describe recent work on using network models to guide experimental validation. We explain how the integration and validation steps spring from a Bayesian description of network uncertainty, and conclude by describing an important near-term milestone for systems biology: the construction of a set of rich reference networks for key model organisms.