RESUMEN
Although genomic research has predominantly relied on phenotypic ascertainment of individuals affected with heritable disease, the falling costs of sequencing allow consideration of genomic ascertainment and reverse phenotyping (the ascertainment of individuals with specific genomic variants and subsequent evaluation of physical characteristics). In this research modality, the scientific question is inverted: investigators gather individuals with a genomic variant and test the hypothesis that there is an associated phenotype via targeted phenotypic evaluations. Genomic ascertainment research is thus a model of predictive genomic medicine and genomic screening. Here, we provide our experience implementing this research method. We describe the infrastructure we developed to perform reverse phenotyping studies, including aggregating a super-cohort of sequenced individuals who consented to recontact for genomic ascertainment research. We assessed 13 studies completed at the National Institutes of Health (NIH) that piloted our reverse phenotyping approach. The studies can be broadly categorized as (1) facilitating novel genotype-disease associations, (2) expanding the phenotypic spectra, or (3) demonstrating ex vivo functional mechanisms of disease. We highlight three examples of reverse phenotyping studies in detail and describe how using a targeted reverse phenotyping approach (as opposed to phenotypic ascertainment or clinical informatics approaches) was crucial to the conclusions reached. Finally, we propose a framework and address challenges to building collaborative genomic ascertainment research programs at other institutions. Our goal is for more researchers to take advantage of this approach, which will expand our understanding of the predictive capability of genomic medicine and increase the opportunity to mitigate genomic disease.
Asunto(s)
Genoma , Informática Médica , Fenotipo , Genotipo , Genómica/métodosRESUMEN
Rationale: Previous studies identified an interaction between HLA and oral peanut exposure. HLA-DQA1*01:02 had a protective role with the induction of Ara h 2 epitope-specific IgG4 associated with peanut consumption during the LEAP clinical trial for prevention of peanut allergy, while it was a risk allele for peanut allergy in the peanut avoidance group. We have now evaluated this gene-environment interaction in two subsequent peanut oral immunotherapy (OIT) trials - IMPACT and POISED - to better understand the potential for the HLA-DQA1*01:02 allele as an indicator of higher likelihood of desensitization, sustained unresponsiveness, and peanut allergy remission. Methods: We determined HLA-DQA1*01:02 carrier status using genome sequencing from POISED (N=118, age: 7-55yr) and IMPACT (N=126, age: 12-<48mo). We tested for association with remission, sustained unresponsiveness (SU), and desensitization in the OIT groups, as well as peanut component specific IgG4 (psIgG4) using generalized linear models and adjusting for relevant covariates and ancestry. Results: While not quite statistically significant, a higher proportion of HLA-DQA1*01:02 carriers receiving OIT in IMPACT were desensitized (93%) compared to non-carriers (78%); odds ratio (OR)=5.74 (p=0.06). In this sample we also observed that a higher proportion of carriers achieved remission (35%) compared to non-carriers (22%); OR=1.26 (p=0.80). In POISED, carriers more frequently attained continued desensitization (80% versus 61% among non-carriers; OR=1.28, p=0.86) and achieved SU (52% versus 31%; OR=2.32, p=0.19). psIgG4 associations with HLA-DQA1*01:02 in the OIT arm of IMPACT which included younger study subjects recapitulated patterns noted in LEAP, but no associations of note were observed in the older POISED study subjects. Conclusions: Findings across three clinical trials show a pattern of a gene environment interaction between HLA and oral peanut exposure. Age, and prior sensitization contribute additional determinants of outcomes, consistent with a mechanism of restricted antigen recognition fundamental to driving protective immune responses to OIT.
Asunto(s)
Arachis , Hipersensibilidad al Cacahuete , Adolescente , Adulto , Niño , Humanos , Persona de Mediana Edad , Adulto Joven , Inmunoglobulina G , Factores Inmunológicos , Inmunoterapia , Hipersensibilidad al Cacahuete/genética , Hipersensibilidad al Cacahuete/terapia , Ensayos Clínicos como AsuntoRESUMEN
Research into rare diseases is typically fragmented by data type and disease. Individual efforts often have poor interoperability and do not systematically connect data across clinical phenotype, genomic data, biomaterial availability, and research/trial data sets. Such data must be linked at both an individual-patient and whole-cohort level to enable researchers to gain a complete view of their disease and patient population of interest. Data access and authorization procedures are required to allow researchers in multiple institutions to securely compare results and gain new insights. Funded by the European Union's Seventh Framework Programme under the International Rare Diseases Research Consortium (IRDiRC), RD-Connect is a global infrastructure project initiated in November 2012 that links genomic data with registries, biobanks, and clinical bioinformatics tools to produce a central research resource for rare diseases.
Asunto(s)
Bancos de Muestras Biológicas , Biología Computacional , Bases de Datos Factuales , Intercambio de Información en Salud , Enfermedades Raras , Sistema de Registros , HumanosRESUMEN
Genome-wide association studies (GWAS) are a useful approach in the study of the genetic components of complex phenotypes. Aside from large cohorts, GWAS have generally been limited to the study of one or a few diseases or traits. The emergence of biobanks linked to electronic medical records (EMRs) allows the efficient reuse of genetic data to yield meaningful genotype-phenotype associations for multiple phenotypes or traits. Phase I of the electronic MEdical Records and GEnomics (eMERGE-I) Network is a National Human Genome Research Institute-supported consortium composed of five sites to perform various genetic association studies using DNA repositories and EMR systems. Each eMERGE site has developed EMR-based algorithms to comprise a core set of 14 phenotypes for extraction of study samples from each site's DNA repository. Each eMERGE site selected samples for a specific phenotype, and these samples were genotyped at either the Broad Institute or at the Center for Inherited Disease Research using the Illumina Infinium BeadChip technology. In all, approximately 17,000 samples from across the five sites were genotyped. A unified quality control (QC) pipeline was developed by the eMERGE Genomics Working Group and used to ensure thorough cleaning of the data. This process includes examination of sample and marker quality and various batch effects. Upon completion of the genotyping and QC analyses for each site's primary study, eMERGE Coordinating Center merged the datasets from all five sites. This larger merged dataset reentered the established eMERGE QC pipeline. Based on lessons learned during the process, additional analyses and QC checkpoints were added to the pipeline to ensure proper merging. Here, we explore the challenges associated with combining datasets from different genotyping centers and describe the expansion to eMERGE QC pipeline for merged datasets. These additional steps will be useful as the eMERGE project expands to include additional sites in eMERGE-II, and also serve as a starting point for investigators merging multiple genotype datasets accessible through the National Center for Biotechnology Information in the database of Genotypes and Phenotypes. Our experience demonstrates that merging multiple datasets after additional QC can be an efficient use of genotype data despite new challenges that appear in the process.
Asunto(s)
Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo/normas , Control de Calidad , Algoritmos , Genotipo , Humanos , National Human Genome Research Institute (U.S.) , Fenotipo , Estados UnidosRESUMEN
Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the electronic MEdical Records and Genomics (eMERGE) network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research.
Asunto(s)
Estudio de Asociación del Genoma Completo/normas , Programas Informáticos , Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo/métodos , Genómica , Genotipo , Humanos , Fenotipo , Control de CalidadRESUMEN
Chemical cross-linking and high resolution MS have been integrated successfully to capture protein interactions and provide low resolution structural data for proteins that are refractive to analyses by NMR or crystallography. Despite the versatility of these combined techniques, the array of products that is generated from the cross-linking and proteolytic digestion of proteins is immense and generally requires the use of labeling strategies and/or data base search algorithms to distinguish actual cross-linked peptides from the many side products of cross-linking. Most strategies reported to date have focused on the analysis of small cross-linked protein complexes (<60 kDa) because the number of potential forms of covalently modified peptides increases dramatically with the number of peptides generated from the digestion of such complexes. We report herein the development of a user-friendly search engine, CrossSearch, that provides the foundation for an overarching strategy to detect cross-linked peptides from the digests of large (>or=170-kDa) cross-linked proteins, i.e. conjugates. Our strategy combines the use of a low excess of cross-linker, data base searching, and Fourier transform ion cyclotron resonance MS to experimentally minimize and theoretically cull the side products of cross-linking. Using this strategy, the (alpha beta gamma delta)(4) phosphorylase kinase model complex was cross-linked to form with high specificity a 170-kDa betagamma conjugate in which we identified residues involved in the intramolecular cross-linking of the 125-kDa beta subunit between its regulatory N terminus and its C terminus. This finding provides an explanation for previously published homodimeric two-hybrid interactions of the beta subunit and suggests a dynamic structural role for the regulatory N terminus of that subunit. The results offer proof of concept for the CrossSearch strategy for analyzing conjugates and are the first to reveal a tertiary structural element of either homologous alpha or beta regulatory subunit of phosphorylase kinase.
Asunto(s)
Reactivos de Enlaces Cruzados/química , Péptidos/análisis , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Programas Informáticos , Animales , Ciclotrones , Análisis de Fourier , Internet , Espectrometría de Masas/métodos , Péptidos/química , Fosforilasa Quinasa/química , Subunidades de Proteína/química , ConejosRESUMEN
Phosphorylase kinase (PhK), an (alphabetagammadelta)(4) complex, regulates glycogenolysis. Its activity, catalyzed by the gamma subunit, is tightly controlled by phosphorylation and activators acting through allosteric sites on its regulatory alpha, beta and delta subunits. Activation by phosphorylation is predominantly mediated by the regulatory beta subunit, which undergoes a conformational change that is structurally linked with the gamma subunit and that is characterized by the ability of a short chemical crosslinker to form beta-beta dimers. To determine potential regions of interaction of the beta and gamma subunits, we have used chemical crosslinking and two-hybrid screening. The beta and gamma subunits were crosslinked to each other in phosphorylated PhK, and crosslinked peptides from digests were identified by Fourier transform mass spectrometry, beginning with a search engine developed "in house" that generates a hypothetical list of crosslinked peptides. A conjugate between beta and gamma that was verified by MS/MS corresponded to crosslinking between K303 in the C-terminal regulatory domain of gamma (gammaCRD) and R18 in the N-terminal regulatory region of beta (beta1-31), which contains the phosphorylatable serines 11 and 26. A synthetic peptide corresponding to residues 1-22 of beta inhibited the crosslinking between beta and gamma, and was itself crosslinked to K303 of gamma. In two-hybrid screening, the beta1-31 region controlled beta subunit self-interactions, in that they were favored by truncation of this region or by mutation of the phosphorylatable serines 11 and 26, thus providing structural evidence for a phosphorylation-dependent subunit communication network in the PhK complex involving at least these two regulatory regions of the beta and gamma subunits. The sum of our results considered together with previous findings implicates the gammaCRD as being an allosteric activation switch in PhK that interacts with all three of the enzyme's regulatory subunits and is proximal to the active site cleft.
Asunto(s)
Regulación Alostérica/efectos de los fármacos , Sitio Alostérico/efectos de los fármacos , Reactivos de Enlaces Cruzados/farmacología , Espectrometría de Masas/métodos , Péptidos/metabolismo , Fosforilasa Quinasa/metabolismo , Secuencia de Aminoácidos , Aminoácidos/metabolismo , Animales , Modelos Biológicos , Datos de Secuencia Molecular , Proteínas Mutantes/análisis , Proteínas Mutantes/química , Proteínas Mutantes/metabolismo , Fosforilasa Quinasa/análisis , Fosforilasa Quinasa/química , Fosforilación/efectos de los fármacos , Fosfoserina/metabolismo , Mutación Puntual/genética , Unión Proteica/efectos de los fármacos , Mapeo de Interacción de Proteínas , Estructura Cuaternaria de Proteína/efectos de los fármacos , Estructura Terciaria de Proteína/efectos de los fármacos , Subunidades de Proteína/análisis , Subunidades de Proteína/química , Subunidades de Proteína/metabolismo , Conejos , Eliminación de Secuencia/genética , Homología Estructural de Proteína , Succinimidas/farmacologíaRESUMEN
MOTIVATION: The abundance of nucleotide sequence information available has expanded horizons of inquiry for molecular evolution; however, the full potential of whole-genome analysis has not been realized because of inadequate tools. Here, we present one of the first toolkits to aid multidisciplinary high-throughput analysis. SUMMARY: SPEED was created to integrate molecular evolutionary data with existing genetic resources and provide a straightforward user interface to 17,352 orthologous gene groups, containing representatives from eight mammalian species and an avian outgroup. AVAILABILITY: See http://bioinfobase.umkc.edu/speed/ for access.
Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Evolución Molecular , Algoritmos , Animales , Bases de Datos Genéticas , Genoma , Humanos , Filogenia , Alineación de Secuencia , Programas InformáticosRESUMEN
BACKGROUND: While studies of non-model organisms are critical for many research areas, such as evolution, development, and environmental biology, they present particular challenges for both experimental and computational genomic level research. Resources such as mass-produced microarrays and the computational tools linking these data to functional annotation at the system and pathway level are rarely available for non-model species. This type of "systems-level" analysis is critical to the understanding of patterns of gene expression that underlie biological processes. RESULTS: We describe a bioinformatics pipeline known as FunnyBase that has been used to store, annotate, and analyze 40,363 expressed sequence tags (ESTs) from the heart and liver of the fish, Fundulus heteroclitus. Primary annotations based on sequence similarity are linked to networks of systematic annotation in Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) and can be queried and computationally utilized in downstream analyses. Steps are taken to ensure that the annotation is self-consistent and that the structure of GO is used to identify higher level functions that may not be annotated directly. An integrated framework for cDNA library production, sequencing, quality control, expression data generation, and systems-level analysis is presented and utilized. In a case study, a set of genes, that had statistically significant regression between gene expression levels and environmental temperature along the Atlantic Coast, shows a statistically significant (P < 0.001) enrichment in genes associated with amine metabolism. CONCLUSION: The methods described have application for functional genomics studies, particularly among non-model organisms. The web interface for FunnyBase can be accessed at http://genomics.rsmas.miami.edu/funnybase/super_craw4/. Data and source code are available by request at jpaschall@bioinfobase.umkc.edu.