ABSTRACT
Many approaches to identify therapeutically relevant neoantigens couple tumor sequencing with bioinformatic algorithms and inferred rules of tumor epitope immunogenicity. However, there are no reference data to compare these approaches, and the parameters governing tumor epitope immunogenicity remain unclear. Here, we assembled a global consortium wherein each participant predicted immunogenic epitopes from shared tumor sequencing data. 608 epitopes were subsequently assessed for T cell binding in patient-matched samples. By integrating peptide features associated with presentation and recognition, we developed a model of tumor epitope immunogenicity that filtered out 98% of non-immunogenic peptides with a precision above 0.70. Pipelines prioritizing model features had superior performance, and pipeline alterations leveraging them improved prediction performance. These findings were validated in an independent cohort of 310 epitopes prioritized from tumor sequencing data and assessed for T cell binding. This data resource enables identification of parameters underlying effective anti-tumor immunity and is available to the research community.
Subject(s)
Antigens, Neoplasm/immunology , Epitopes/immunology , Neoplasms/immunology , Alleles , Antigen Presentation/immunology , Cohort Studies , Humans , Peptides/immunology , Programmed Cell Death 1 Receptor , Reproducibility of ResultsABSTRACT
Loss of function of the kinase IRAK4 or the adaptor MyD88 in humans interrupts a pathway critical for pathogen sensing and ignition of inflammation. However, patients with loss-of-function mutations in the genes encoding these factors are, unexpectedly, susceptible to only a limited range of pathogens. We employed a systems approach to investigate transcriptome responses following in vitro exposure of patients' blood to agonists of Toll-like receptors (TLRs) and receptors for interleukin 1 (IL-1Rs) and to whole pathogens. Responses to purified agonists were globally abolished, but variable residual responses were present following exposure to whole pathogens. Further delineation of the latter responses identified a narrow repertoire of transcriptional programs affected by loss of MyD88 function or IRAK4 function. Our work introduces the use of a systems approach for the global assessment of innate immune responses and the characterization of human primary immunodeficiencies.
Subject(s)
Immunologic Deficiency Syndromes/genetics , Immunologic Deficiency Syndromes/immunology , Interleukin-1 Receptor-Associated Kinases/genetics , Mutation , Myeloid Differentiation Factor 88/genetics , Adolescent , Child , Child, Preschool , Cluster Analysis , Female , Gene Expression Profiling , Humans , Immunity, Innate/genetics , Immunity, Innate/immunology , Infant , Interleukin-1 Receptor-Associated Kinases/immunology , Male , Oligonucleotide Array Sequence Analysis , Primary Immunodeficiency Diseases , TranscriptomeABSTRACT
A continuing challenge in modern medicine is the identification of safer and more efficacious drugs. Precision therapeutics, which have one molecular target, have been long promised to be safer and more effective than traditional therapies. This approach has proven to be challenging for multiple reasons including lack of efficacy, rapidly acquired drug resistance, and narrow patient eligibility criteria. An alternative approach is the development of drugs that address the overall disease network by targeting multiple biological targets ('polypharmacology'). Rational development of these molecules will require improved methods for predicting single chemical structures that target multiple drug targets. To address this need, we developed the Multi-Targeting Drug DREAM Challenge, in which we challenged participants to predict single chemical entities that target pro-targets but avoid anti-targets for two unrelated diseases: RET-based tumors and a common form of inherited Tauopathy. Here, we report the results of this DREAM Challenge and the development of two neural network-based machine learning approaches that were applied to the challenge of rational polypharmacology. Together, these platforms provide a potentially useful first step towards developing lead therapeutic compounds that address disease complexity through rational polypharmacology.
Subject(s)
Drug Development , Neoplasms/drug therapy , Protein Kinase Inhibitors/pharmacology , Proto-Oncogene Proteins c-ret/antagonists & inhibitors , Tauopathies/drug therapy , Humans , Neoplasms/metabolism , Neural Networks, Computer , Polypharmacology , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/therapeutic use , Proto-Oncogene Proteins c-ret/genetics , Proto-Oncogene Proteins c-ret/metabolism , tau Proteins/genetics , tau Proteins/metabolismABSTRACT
RNA secondary structure plays a central role in the replication and metabolism of all RNA viruses, including retroviruses like HIV-1. However, structures with known function represent only a fraction of the secondary structure reported for HIV-1(NL4-3). One tool to assess the importance of RNA structures is to examine their conservation over evolutionary time. To this end, we used SHAPE to model the secondary structure of a second primate lentiviral genome, SIVmac239, which shares only 50% sequence identity at the nucleotide level with HIV-1NL4-3. Only about half of the paired nucleotides are paired in both genomic RNAs and, across the genome, just 71 base pairs form with the same pairing partner in both genomes. On average the RNA secondary structure is thus evolving at a much faster rate than the sequence. Structure at the Gag-Pro-Pol frameshift site is maintained but in a significantly altered form, while the impact of selection for maintaining a protein binding interaction can be seen in the conservation of pairing partners in the small RRE stems where Rev binds. Structures that are conserved between SIVmac239 and HIV-1(NL4-3) also occur at the 5' polyadenylation sequence, in the plus strand primer sites, PPT and cPPT, and in the stem-loop structure that includes the first splice acceptor site. The two genomes are adenosine-rich and cytidine-poor. The structured regions are enriched in guanosines, while unpaired regions are enriched in adenosines, and functionaly important structures have stronger base pairing than nonconserved structures. We conclude that much of the secondary structure is the result of fortuitous pairing in a metastable state that reforms during sequence evolution. However, secondary structure elements with important function are stabilized by higher guanosine content that allows regions of structure to persist as sequence evolution proceeds, and, within the confines of selective pressure, allows structures to evolve.
Subject(s)
Genome, Viral , HIV-1/genetics , Nucleic Acid Conformation , RNA, Viral/chemistry , RNA, Viral/genetics , Simian Immunodeficiency Virus/genetics , Animals , Base Composition , Base Sequence , Binding Sites , Evolution, Molecular , Frameshift Mutation , Genes, env/genetics , Humans , Mice , RNA-Binding Proteins/metabolism , Sequence Alignment , Sequence Homology, Nucleic AcidABSTRACT
Single-stranded RNA viruses encompass broad classes of infectious agents and cause the common cold, cancer, AIDS and other serious health threats. Viral replication is regulated at many levels, including the use of conserved genomic RNA structures. Most potential regulatory elements in viral RNA genomes are uncharacterized. Here we report the structure of an entire HIV-1 genome at single nucleotide resolution using SHAPE, a high-throughput RNA analysis technology. The genome encodes protein structure at two levels. In addition to the correspondence between RNA and protein primary sequences, a correlation exists between high levels of RNA structure and sequences that encode inter-domain loops in HIV proteins. This correlation suggests that RNA structure modulates ribosome elongation to promote native protein folding. Some simple genome elements previously shown to be important, including the ribosomal gag-pol frameshift stem-loop, are components of larger RNA motifs. We also identify organizational principles for unstructured RNA regions, including splice site acceptors and hypervariable regions. These results emphasize that the HIV-1 genome and, potentially, many coding RNAs are punctuated by previously unrecognized regulatory motifs and that extensive RNA structure constitutes an important component of the genetic code.
Subject(s)
Genome, Viral/genetics , HIV-1/genetics , Nucleic Acid Conformation , RNA, Viral/chemistry , RNA, Viral/genetics , Computational Biology , HIV Envelope Protein gp120/genetics , HIV-1/metabolism , Human Immunodeficiency Virus Proteins/chemistry , Human Immunodeficiency Virus Proteins/genetics , Protein Conformation , Protein Folding , Protein Sorting Signals/geneticsABSTRACT
There is great interindividual variability in HIV-1 viral setpoint after seroconversion, some of which is known to be due to genetic differences among infected individuals. Here, our focus is on determining, genome-wide, the contribution of variable gene expression to viral control, and to relate it to genomic DNA polymorphism. RNA was extracted from purified CD4+ T-cells from 137 HIV-1 seroconverters, 16 elite controllers, and 3 healthy blood donors. Expression levels of more than 48,000 mRNA transcripts were assessed by the Human-6 v3 Expression BeadChips (Illumina). Genome-wide SNP data was generated from genomic DNA using the HumanHap550 Genotyping BeadChip (Illumina). We observed two distinct profiles with 260 genes differentially expressed depending on HIV-1 viral load. There was significant upregulation of expression of interferon stimulated genes with increasing viral load, including genes of the intrinsic antiretroviral defense. Upon successful antiretroviral treatment, the transcriptome profile of previously viremic individuals reverted to a pattern comparable to that of elite controllers and of uninfected individuals. Genome-wide evaluation of cis-acting SNPs identified genetic variants modulating expression of 190 genes. Those were compared to the genes whose expression was found associated with viral load: expression of one interferon stimulated gene, OAS1, was found to be regulated by a SNP (rs3177979, p = 4.9E-12); however, we could not detect an independent association of the SNP with viral setpoint. Thus, this study represents an attempt to integrate genome-wide SNP signals with genome-wide expression profiles in the search for biological correlates of HIV-1 control. It underscores the paradox of the association between increasing levels of viral load and greater expression of antiviral defense pathways. It also shows that elite controllers do not have a fully distinctive mRNA expression pattern in CD4+ T cells. Overall, changes in global RNA expression reflect responses to viral replication rather than a mechanism that might explain viral control.
Subject(s)
CD4-Positive T-Lymphocytes/immunology , CD4-Positive T-Lymphocytes/virology , Gene Expression Profiling , HIV Infections/genetics , HIV Infections/immunology , RNA, Messenger/genetics , Adult , Cell Separation , Female , Gene Expression , Genome-Wide Association Study , HIV-1/immunology , Humans , Male , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide , RNA, Messenger/analysis , Viral LoadABSTRACT
The mutational deterministic hypothesis for the origin and maintenance of sexual reproduction posits that sex enhances the ability of natural selection to purge deleterious mutations after recombination brings them together into single genomes. This explanation requires negative epistasis, a type of genetic interaction where mutations are more harmful in combination than expected from their separate effects. The conceptual appeal of the mutational deterministic hypothesis has been offset by our inability to identify the mechanistic and evolutionary bases of negative epistasis. Here we show that negative epistasis can evolve as a consequence of sexual reproduction itself. Using an artificial gene network model, we find that recombination between gene networks imposes selection for genetic robustness, and that negative epistasis evolves as a by-product of this selection. Our results suggest that sexual reproduction selects for conditions that favour its own maintenance, a case of evolution forging its own path.
Subject(s)
Biological Evolution , Epistasis, Genetic , Genes, Synthetic/genetics , Models, Genetic , Reproduction/genetics , Selection, Genetic , Sex , Animals , Drosophila melanogaster/genetics , Genotype , Mutation/geneticsABSTRACT
The availability of high-quality RNA-sequencing and genotyping data of post-mortem brain collections from consortia such as CommonMind Consortium (CMC) and the Accelerating Medicines Partnership for Alzheimer's Disease (AMP-AD) Consortium enable the generation of a large-scale brain cis-eQTL meta-analysis. Here we generate cerebral cortical eQTL from 1433 samples available from four cohorts (identifying >4.1 million significant eQTL for >18,000 genes), as well as cerebellar eQTL from 261 samples (identifying 874,836 significant eQTL for >10,000 genes). We find substantially improved power in the meta-analysis over individual cohort analyses, particularly in comparison to the Genotype-Tissue Expression (GTEx) Project eQTL. Additionally, we observed differences in eQTL patterns between cerebral and cerebellar brain regions. We provide these brain eQTL as a resource for use by the research community. As a proof of principle for their utility, we apply a colocalization analysis to identify genes underlying the GWAS association peaks for schizophrenia and identify a potentially novel gene colocalization with lncRNA RP11-677M14.2 (posterior probability of colocalization 0.975).
Subject(s)
Cerebellar Cortex/metabolism , Cerebral Cortex/metabolism , Gene Expression Profiling , Quantitative Trait Loci , Datasets as Topic , Genome-Wide Association Study , Humans , Meta-Analysis as Topic , RNA, Long Noncoding/genetics , Schizophrenia/geneticsABSTRACT
There is growing evidence that interactions between biological molecules (e.g., RNA-RNA, protein-protein, RNA-protein) place limits on the rate and trajectory of molecular evolution. Here, by extending Kimura's model of compensatory evolution at interacting sites, we show that the ratio of transition to transversion substitutions (kappa) at interacting sites should be equal to the square of the ratio at independent sites. Because transition mutations generally occur at a higher rate than transversions, the model predicts that kappa should be higher at interacting sites than at independent sites. We tested this prediction in 10 RNA secondary structures by comparing phylogenetically derived estimates of kappa in paired sites within stems (kappa(p)) and unpaired sites within loops (kappa(u)). Eight of the 10 structures showed an excellent match to the quantitative predictions of the model, and 9 of the 10 structures matched the qualitative prediction kappa(p) > kappa(u). Only the Rev response element from the human immunovirus (HIV) genome showed the reverse pattern, with kappa(p) < kappa(u). Although a variety of evolutionary forces could produce quantitative deviations from the model predictions, the reversal in magnitude of kappa(p) and kappa(u) could be achieved only by violating the model assumption that the underlying transition (or transversion) mutation rates were identical in paired and unpaired regions of the molecule. We explore the ability of the APOBEC3 enzymes, host defense mechanisms against retroviruses, which induce transition mutations preferentially in single-stranded regions of the HIV genome, to explain this exception to the rule. Taken as a whole, our findings suggest that kappa may have utility as a simple diagnostic to evaluate proposed secondary structures.
Subject(s)
Evolution, Molecular , Genes, env/genetics , Models, Genetic , Mutation/genetics , Nucleic Acid Conformation , Phylogeny , RNA/genetics , APOBEC Deaminases , Bayes Theorem , Computational Biology , Cytidine Deaminase , Cytosine Deaminase/genetics , Sequence AlignmentABSTRACT
Previous genome-wide association studies (GWAS), conducted by our group and others, have identified loci that harbor risk variants for neurodegenerative diseases, including Alzheimer's disease (AD). Human disease variants are enriched for polymorphisms that affect gene expression, including some that are known to associate with expression changes in the brain. Postulating that many variants confer risk to neurodegenerative disease via transcriptional regulatory mechanisms, we have analyzed gene expression levels in the brain tissue of subjects with AD and related diseases. Herein, we describe our collective datasets comprised of GWAS data from 2,099 subjects; microarray gene expression data from 773 brain samples, 186 of which also have RNAseq; and an independent cohort of 556 brain samples with RNAseq. We expect that these datasets, which are available to all qualified researchers, will enable investigators to explore and identify transcriptional mechanisms contributing to neurodegenerative diseases.
Subject(s)
Alzheimer Disease/genetics , Genome, Human , Neurodegenerative Diseases/genetics , Transcriptome , Genome-Wide Association Study , HumansABSTRACT
Over 100 genetic loci harbor schizophrenia-associated variants, yet how these variants confer liability is uncertain. The CommonMind Consortium sequenced RNA from dorsolateral prefrontal cortex of people with schizophrenia (N = 258) and control subjects (N = 279), creating a resource of gene expression and its genetic regulation. Using this resource, â¼20% of schizophrenia loci have variants that could contribute to altered gene expression and liability. In five loci, only a single gene was involved: FURIN, TSNARE1, CNTN4, CLCN3 or SNAP91. Altering expression of FURIN, TSNARE1 or CNTN4 changed neurodevelopment in zebrafish; knockdown of FURIN in human neural progenitor cells yielded abnormal migration. Of 693 genes showing significant case-versus-control differential expression, their fold changes were ≤ 1.33, and an independent cohort yielded similar results. Gene co-expression implicates a network relevant for schizophrenia. Our findings show that schizophrenia is polygenic and highlight the utility of this resource for mechanistic interpretations of genetic liability for brain diseases.
Subject(s)
Gene Expression Regulation/genetics , Genetic Predisposition to Disease , Multifactorial Inheritance/genetics , Schizophrenia/genetics , Brain/metabolism , Female , Genome-Wide Association Study , Humans , Male , Polymorphism, Single Nucleotide , RiskABSTRACT
All aspects of plant and animal development are controlled by complex networks of transcription factors. Transcription factors are essential for converting signaling inputs, such as changes in daylength, into complex gene regulatory outputs. While some transcription factors control gene expression by binding to cis-regulatory elements as individual subunits, others function in a combinatorial fashion. How individual subunits of combinatorial transcription factors are spatially and temporally deployed (e.g. expression-level, posttranslational modifications and subcellular localization) has profound effects on their control of gene expression. In the model plant Arabidopsis (Arabidopsis thaliana), we have identified 36 Nuclear Factor Y (NF-Y) transcription factor subunits (10 NF-YA, 13 NF-YB, and 13 NF-YC subunits) that can theoretically combine to form 1,690 unique complexes. Individual plant subunits have functions in flowering time, embryo maturation, and meristem development, but how they combine to control these processes is unknown. To assist in the process of defining unique NF-Y complexes, we have created promoter:beta-glucuronidase fusion lines for all 36 Arabidopsis genes. Here, we show NF-Y expression patterns inferred from these promoter:beta-glucuronidase lines for roots, light- versus dark-grown seedlings, rosettes, and flowers. Additionally, we review the phylogenetic relationships and examine protein alignments for each NF-Y subunit family. The results are discussed with a special emphasis on potential roles for NF-Y subunits in photoperiod-controlled flowering time.
Subject(s)
Arabidopsis Proteins/genetics , Arabidopsis/genetics , CCAAT-Binding Factor/genetics , Gene Expression Regulation, Plant , Recombination, Genetic , Transcription Factors/genetics , Arabidopsis/classification , Arabidopsis Proteins/classification , Cloning, Molecular , Combinatorial Chemistry Techniques/methods , Multigene Family , PhylogenyABSTRACT
OBJECTIVES: To study the relationship between HIV-1 subtype C genetic diversity and mother-to-child transmission and to determine if transmission of HIV-1 V1/V2 env variants occurs stochastically. DESIGN: Case-case-control study of Malawian mother-infant pairs consisting of 32 nontransmitting women, 25 intrauterine transmitters, and 23 intrapartum transmitters in Blantyre, Malawi. METHODS: A heteroduplex tracking assay against the highly variable HIV-1 env V1/V2 region was used to characterize the relationship between HIV-1 diversity and mother-to-child transmission. The relative abundance of the maternal env variants was quantified and categorized as transmitted or nontransmitted based on the env variants detected in the infant plasma. The V1/V2 region was sequenced from two mother-infant pairs and a phylogenetic tree was built. RESULTS: No relationship was found between transmission and overall maternal env diversity. Infants had less diverse HIV-1 populations than their mothers, and intrauterine-infected infants had fewer V1/V2 variants and were more likely to harbor a homogeneous V1/V2 population than infants infected intrapartum. V1/V2 sequences cloned from two mother-infant transmission pairs support multiple env variant transmission when multiple variants are detected, rather than single variant transmission followed by diversification. Almost 50% of the HIV-infected infants contained V1/V2 env variants that were not detected in maternal plasma samples. Finally transmission of env variants was not related to their abundance in maternal blood. CONCLUSION: These data suggest that the predominant mechanism(s) of HIV-1 subtype C mother-to-child transmission differs by the timing of transmission and is unlikely to be explained by a simple stochastic model.