RESUMO
DNA variants that affect alternative splicing and the relative quantities of different gene transcripts have been shown to be risk alleles for some Mendelian diseases. However, for complex traits characterized by a low odds ratio for any single contributing variant, very few studies have investigated the contribution of splicing variants. The overarching goal of this study is to discover and characterize the role that variants affecting alternative splicing may play in the genetic etiology of complex traits, which include a significant number of the common human diseases. Specifically, we hypothesize that single nucleotide polymorphisms (SNPs) in splicing regulatory elements can be characterized in silico to identify variants affecting splicing, and that these variants may contribute to the etiology of complex diseases as well as the inter-individual variability in the ratios of alternative transcripts. We leverage high-throughput expression profiling to 1) experimentally validate our in silico predictions of skipped exons and 2) characterize the molecular role of intronic genetic variations in alternative splicing events in the context of complex human traits and diseases. We propose that intronic SNPs play a role as genetic regulators within splicing regulatory elements and show that their associated exon skipping events can affect protein domains and structure. We find that SNPs we would predict to affect exon skipping are enriched among the set of SNPs reported to be associated with complex human traits.
Assuntos
Processamento Alternativo , Éxons , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável , Biologia Computacional/métodos , Quinases Ciclina-Dependentes/química , Quinases Ciclina-Dependentes/genética , Predisposição Genética para Doença , Humanos , Íntrons , Modelos Moleculares , Fenótipo , Conformação Proteica , Proteínas/química , Proteínas/genética , Locos de Características Quantitativas , Isoformas de RNARESUMO
BACKGROUND: The current state of the art for measuring stromal response to targeted therapy requires burdensome and rate limiting quantitative histology. Transcriptome measures are increasingly affordable and provide an opportunity for developing a stromal versus cancer ratio in xenograft models. In these models, human cancer cells are transplanted into mouse host tissues (stroma) and together coevolve into a tumour microenvironment. However, profiling the mouse or human component separately remains problematic. Indeed, laser capture microdissection is labour intensive. Moreover, gene expression using commercial microarrays introduces significant and underreported cross-species hybridization errors that are commonly overlooked by biologists. METHOD: We developed a customized dual-species array, H&M array, and performed cross-species and species-specific hybridization measurements. We validated a new methodology for establishing the stroma vs cancer ratio using transcriptomic data. RESULTS: In the biological validation of the H&M array, cross-species hybridization of human and mouse probes was significantly reduced (4.5 and 9.4 fold reduction, respectively; p < 2x10-16 for both, Mann-Whitney test). We confirmed the capability of the H&M array to determine the stromal to cancer cells ratio based on the estimation of cellularity index of mouse/human mRNA content in vitro. This new metrics enable to investigate more efficiently the stroma-cancer cell interactions (e.g. cellularity) bypassing labour intensive requirement and biases of laser capture microdissection. CONCLUSION: These results provide the initial evidence of improved and cost-efficient analytics for the investigation of cancer cell microenvironment, using species-specificity arrays specifically designed for xenografts models.
Assuntos
Transformação Celular Neoplásica , Perfilação da Expressão Gênica , Genômica/métodos , Terapia de Alvo Molecular , Neoplasias/genética , Neoplasias/patologia , Ensaios Antitumorais Modelo de Xenoenxerto , Animais , Humanos , Camundongos , Anotação de Sequência Molecular , Neoplasias/tratamento farmacológico , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Reprodutibilidade dos Testes , Especificidade da Espécie , Células Estromais/metabolismo , Células Estromais/patologia , Microambiente TumoralRESUMO
BACKGROUND: While genome-wide association studies (GWAS) of complex traits have revealed thousands of reproducible genetic associations to date, these loci collectively confer very little of the heritability of their respective diseases and, in general, have contributed little to our understanding the underlying disease biology. Physical protein interactions have been utilized to increase our understanding of human Mendelian disease loci but have yet to be fully exploited for complex traits. METHODS: We hypothesized that protein interaction modeling of GWAS findings could highlight important disease-associated loci and unveil the role of their network topology in the genetic architecture of diseases with complex inheritance. RESULTS: Network modeling of proteins associated with the intragenic single nucleotide polymorphisms of the National Human Genome Research Institute catalog of complex trait GWAS revealed that complex trait associated loci are more likely to be hub and bottleneck genes in available, albeit incomplete, networks (OR=1.59, Fisher's exact test p < 2.24 × 10(-12)). Network modeling also prioritized novel type 2 diabetes (T2D) genetic variations from the Finland-USA Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics and the Wellcome Trust GWAS data, and demonstrated the enrichment of hubs and bottlenecks in prioritized T2D GWAS genes. The potential biological relevance of the T2D hub and bottleneck genes was revealed by their increased number of first degree protein interactions with known T2D genes according to several independent sources (p<0.01, probability of being first interactors of known T2D genes). CONCLUSION: Virtually all common diseases are complex human traits, and thus the topological centrality in protein networks of complex trait genes has implications in genetics, personal genomics, and therapy.
Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Mapas de Interação de Proteínas/genética , Biologia Computacional/métodos , Humanos , Polimorfismo de Nucleotídeo Único , Mapeamento de Interação de ProteínasRESUMO
OBJECTIVE: Thousands of complex-disease single-nucleotide polymorphisms (SNPs) have been discovered in genome-wide association studies (GWAS). However, these intragenic SNPs have not been collectively mined to unveil the genetic architecture between complex clinical traits. The authors hypothesize that biological annotations of host genes of trait-associated SNPs may reveal the biomolecular modularity across complex-disease traits and offer insights for drug repositioning. METHODS: Trait-to-polymorphism (SNPs) associations confirmed in GWAS were used. A novel method to quantify trait-trait similarity anchored in Gene Ontology annotations of human proteins and information theory was developed. The results were then validated with the shortest paths of physical protein interactions between biologically similar traits. RESULTS: A network was constructed consisting of 280 significant intertrait similarities among 177 disease traits, which covered 1438 well-validated disease-associated SNPs. Thirty-nine percent of intertrait connections were confirmed by curators, and the following additional studies demonstrated the validity of a proportion of the remainder. On a phenotypic trait level, higher Gene Ontology similarity between proteins correlated with smaller 'shortest distance' in protein interaction networks of complexly inherited diseases (Spearman p<2.2×10(-16)). Further, 'cancer traits' were similar to one another, as were 'metabolic syndrome traits' (Fisher's exact test p=0.001 and 3.5×10(-7), respectively). CONCLUSION: An imputed disease network by information-anchored functional similarity from GWAS trait-associated SNPs is reported. It is also demonstrated that small shortest paths of protein interactions correlate with complex-disease function. Taken together, these findings provide the framework for investigating drug targets with unbiased functional biomolecular networks rather than worn-out single-gene and subjective canonical pathway approaches.
Assuntos
Doença/genética , Teoria da Informação , Modelos Biológicos , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla , Humanos , Fenótipo , Mapas de Interação de ProteínasRESUMO
SIRE1 is a 2000-copy member of the Ty1/copia retroelement family found in the soybean genome and is closely related to sireviruses found in the genomes of other legumes. Although these elements closely resemble typical plant members of the Ty1/copia family, they are unusual in that they possess an envelope-like coding region immediately downstream of the reverse transcriptase gene. Despite its copy number, very few members of the SIRE1 family are currently present in publicly available genomic assemblies or draft contigs. However, fragments of family members are well-represented as BAC-ends in the GenBank Genome Survey Sequence database. This database was queried using the 5' and 3' ends of SIRE1 in order to catalog sequences into which SIRE1 members have integrated. Seven hundred and eighty-one unique SIRE1 insertions were identified and the majority of insertion sites constituted other repetitive elements, including Class I and Class II transposable elements and satellite DNAs. Ninety-four insertions were in single- or low-copy number sequences and three of these were homologous to characterized protein-coding genes. Examination of the ten bases flanking either side of SIRE1 revealed no clear consensus sequence, but the the distributions of A, C, G, and T at most of the positions were biased with strong statistical significance.