RESUMO
Electronic health records (EHRs) coupled with large-scale biobanks offer great promises to unravel the genetic underpinnings of treatment efficacy. However, medication-induced biomarker trajectories stemming from such records remain poorly studied. Here, we extract clinical and medication prescription data from EHRs and conduct GWAS and rare variant burden tests in the UK Biobank (discovery) and the All of Us program (replication) on ten cardiometabolic drug response outcomes including lipid response to statins, HbA1c response to metformin and blood pressure response to antihypertensives (N = 740-26,669). Our findings at genome-wide significance level recover previously reported pharmacogenetic signals and also include novel associations for lipid response to statins (N = 26,669) near LDLR and ZNF800. Importantly, these associations are treatment-specific and not associated with biomarker progression in medication-naive individuals. Furthermore, we demonstrate that individuals with higher genetically determined low-density and total cholesterol baseline levels experience increased absolute, albeit lower relative biomarker reduction following statin treatment. In summary, we systematically investigated the common and rare pharmacogenetic contribution to cardiometabolic drug response phenotypes in over 50,000 UK Biobank and All of Us participants with EHR and identified clinically relevant genetic predictors for improved personalized treatment strategies.
RESUMO
Phasing involves distinguishing the two parentally inherited copies of each chromosome into haplotypes. Here, we introduce SHAPEIT5, a new phasing method that quickly and accurately processes large sequencing datasets and applied it to UK Biobank (UKB) whole-genome and whole-exome sequencing data. We demonstrate that SHAPEIT5 phases rare variants with low switch error rates of below 5% for variants present in just 1 sample out of 100,000. Furthermore, we outline a method for phasing singletons, which, although less precise, constitutes an important step towards future developments. We then demonstrate that the use of UKB as a reference panel improves the accuracy of genotype imputation, which is even more pronounced when phased with SHAPEIT5 compared with other methods. Finally, we screen the UKB data for loss-of-function compound heterozygous events and identify 549 genes where both gene copies are knocked out. These genes complement current knowledge of gene essentiality in the human genome.
Assuntos
Bancos de Espécimes Biológicos , Genoma Humano , Humanos , Sequenciamento do Exoma , Análise de Sequência de DNA/métodos , Genótipo , Haplótipos , Genoma Humano/genética , Reino Unido , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Non-coding regulatory elements such as enhancers are key in controlling the cell-type specificity and spatio-temporal expression of genes. To drive stable and precise gene transcription robust to genetic variation and environmental stress, genes are often targeted by multiple enhancers with redundant action. However, it is unknown whether enhancers targeting the same gene display simultaneous activity or whether some enhancer combinations are more often co-active than others. Here, we take advantage of recent developments in single cell technology that permit assessing chromatin status (scATAC-seq) and gene expression (scRNA-seq) in the same single cells to correlate gene expression to the activity of multiple enhancers. Measuring activity patterns across 24,844 human lymphoblastoid single cells, we find that the majority of enhancers associated with the same gene display significant correlation in their chromatin profiles. For 6944 expressed genes associated with enhancers, we predict 89,885 significant enhancer-enhancer associations between nearby enhancers. We find that associated enhancers share similar transcription factor binding profiles and that gene essentiality is linked with higher enhancer co-activity. We provide a set of predicted enhancer-enhancer associations based on correlation derived from a single cell line, which can be further investigated for functional relevance.
Assuntos
Cromatina , Elementos Facilitadores Genéticos , Humanos , Cromatina/genética , Linhagem CelularRESUMO
Studying the interplay between genetic variation, epigenetic changes, and regulation of gene expression is crucial to understand the modification of cellular states in various conditions, including immune diseases. In this study, we characterize the cell-specificity in three key cells of the human immune system by building cis maps of regulatory regions with coordinated activity (CRDs) from ChIP-seq peaks and methylation data. We find that only 33% of CRD-gene associations are shared between cell types, revealing how similarly located regulatory regions provide cell-specific modulation of gene activity. We emphasize important biological mechanisms, as most of our associations are enriched in cell-specific transcription factor binding sites, blood-traits, and immune disease-associated loci. Notably, we show that CRD-QTLs aid in interpreting GWAS findings and help prioritize variants for testing functional hypotheses within human complex diseases. Additionally, we map trans CRD regulatory associations, and among 207 trans-eQTLs discovered, 46 overlap with the QTLGen Consortium meta-analysis in whole blood, showing that mapping functional regulatory units using population genomics allows discovering important mechanisms in the regulation of gene expression in immune cells. Finally, we constitute a comprehensive resource describing multi-omics changes to gain a greater understanding of cell-type specific regulatory mechanisms of immunity.
Assuntos
Locos de Características Quantitativas , Sequências Reguladoras de Ácido Nucleico , Humanos , Sequências Reguladoras de Ácido Nucleico/genética , Epigênese Genética , Fenótipo , Variação GenéticaRESUMO
Identical genetic variations can have different phenotypic effects depending on their parent of origin. Yet, studies focusing on parent-of-origin effects have been limited in terms of sample size due to the lack of parental genomes or known genealogies. We propose a probabilistic approach to infer the parent-of-origin of individual alleles that does not require parental genomes nor prior knowledge of genealogy. Our model uses Identity-By-Descent sharing with second- and third-degree relatives to assign alleles to parental groups and leverages chromosome X data in males to distinguish maternal from paternal groups. We combine this with robust haplotype inference and haploid imputation to infer the parent-of-origin for 26,393 UK Biobank individuals. We screen 99 phenotypes for parent-of-origin effects and replicate the discoveries of 6 GWAS studies, confirming signals on body mass index, type 2 diabetes, standing height and multiple blood biomarkers, including the known maternal effect at the MEG3/DLK1 locus on platelet phenotypes. We also report a novel maternal effect at the TERT gene on telomere length, thereby providing new insights on the heritability of this phenotype. All our summary statistics are publicly available to help the community to better characterize the molecular mechanisms leading to parent-of-origin effects and their implications for human health.
Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Masculino , Alelos , Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Fenótipo , FemininoRESUMO
[This corrects the article DOI: 10.1371/journal.pntd.0005559.].
RESUMO
Comparing transcript levels between healthy and diseased individuals allows the identification of differentially expressed genes, which may be causes, consequences or mere correlates of the disease under scrutiny. We propose a method to decompose the observational correlation between gene expression and phenotypes driven by confounders, forward- and reverse causal effects. The bi-directional causal effects between gene expression and complex traits are obtained by Mendelian Randomization integrating summary-level data from GWAS and whole-blood eQTLs. Applying this approach to complex traits reveals that forward effects have negligible contribution. For example, BMI- and triglycerides-gene expression correlation coefficients robustly correlate with trait-to-expression causal effects (rBMI = 0.11, PBMI = 2.0 × 10-51 and rTG = 0.13, PTG = 1.1 × 10-68), but not detectably with expression-to-trait effects. Our results demonstrate that studies comparing the transcriptome of diseased and healthy subjects are more prone to reveal disease-induced gene expression changes rather than disease causing ones.
Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Transcriptoma/genética , Causalidade , Estudos de Associação Genética/métodos , Humanos , Análise da Randomização Mendeliana/métodos , Fenótipo , Locos de Características Quantitativas/genéticaRESUMO
Nearby genes are often expressed as a group. Yet, the prevalence, molecular mechanisms and genetic control of local gene co-expression are far from being understood. Here, by leveraging gene expression measurements across 49 human tissues and hundreds of individuals, we find that local gene co-expression occurs in 13% to 53% of genes per tissue. By integrating various molecular assays (e.g. ChIP-seq and Hi-C), we estimate the ability of several mechanisms, such as enhancer-gene interactions, in distinguishing gene pairs that are co-expressed from those that are not. Notably, we identify 32,636 expression quantitative trait loci (eQTLs) which associate with co-expressed gene pairs and often overlap enhancer regions. Due to affecting several genes, these eQTLs are more often associated with multiple human traits than other eQTLs. Our study paves the way to comprehend trait pleiotropy and functional interpretation of QTL and GWAS findings. All local gene co-expression identified here is available through a public database ( https://glcoex.unil.ch/ ).
Assuntos
Regulação da Expressão Gênica , Pleiotropia Genética/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas/genética , Sítios de Ligação/genética , Ontologia Genética , Estudos de Associação Genética/métodos , Humanos , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismoRESUMO
Low-coverage whole-genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined because current imputation methods are computationally expensive and unable to leverage large reference panels. Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. GLIMPSE achieves imputation of a genome for less than US$1 in computational cost, considerably outperforming other methods and improving imputation accuracy over the full allele frequency range. As a proof of concept, we show that 1× coverage enables effective gene expression association studies and outperforms dense SNP arrays in rare variant burden tests. Overall, this study illustrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies.
Assuntos
Análise de Sequência de DNA , Genoma Humano , Genótipo , Humanos , Funções Verossimilhança , Polimorfismo de Nucleotídeo Único/genética , Padrões de ReferênciaRESUMO
Multifunctional proteins often perform their different functions when localized in different subcellular compartments. However, the mechanisms leading to their localization are largely unknown. Recently, 3'UTRs were found to regulate the cellular localization of newly synthesized proteins through the formation of 3'UTR-protein complexes. Here, we investigate the formation of 3'UTR-protein complexes involving multifunctional proteins by exploiting large-scale protein-protein and protein-RNA interaction networks. Focusing on 238 human 'extreme multifunctional' (EMF) proteins, we predicted 1411 3'UTR-protein complexes involving 54% of those proteins and evaluated their role in regulating protein cellular localization and multifunctionality. We find that EMF proteins lacking localization addressing signals, yet present at both the nucleus and cell surface, often form 3'UTR-protein complexes, and that the formation of these complexes could provide EMF proteins with the diversity of interaction partners necessary to their multifunctionality. Our findings are reinforced by archetypal moonlighting proteins predicted to form 3'UTR-protein complexes. Finally, the formation of 3'UTR-protein complexes that involves up to 17% of the proteins in the human protein-protein interaction network, may be a common and yet underestimated protein trafficking mechanism, particularly suited to regulate the localization of multifunctional proteins.
Assuntos
Regiões 3' não Traduzidas , Proteínas de Membrana/metabolismo , Mapas de Interação de Proteínas , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/metabolismo , Humanos , Proteínas de Membrana/química , Ligação Proteica , Biossíntese de Proteínas , Sinais Direcionadores de Proteínas , Transporte Proteico , RNA Mensageiro/química , RNA Mensageiro/genética , Proteínas de Ligação a RNA/químicaRESUMO
Moonlighting proteins perform multiple unrelated functions without any change in polypeptide sequence. They can coordinate cellular activities, serving as switches between pathways and helping to respond to changes in the cellular environment. Therefore, regulation of the multiple protein activities, in space and time, is likely to be important for the homeostasis of biological systems. Some moonlighting proteins may perform their multiple functions simultaneously while others alternate between functions due to certain triggers. The switch of the moonlighting protein's functions can be regulated by several distinct factors, including the binding of other molecules such as proteins. We here review the approaches used to identify moonlighting proteins and existing repositories. We particularly emphasise the role played by short linear motifs and PTMs as regulatory switches of moonlighting functions.
Assuntos
Proteínas/metabolismo , Animais , Fenômenos Fisiológicos Celulares/fisiologia , Bases de Dados de Proteínas , Humanos , Conformação ProteicaRESUMO
The coordination of the synthesis of functionally-related proteins can be achieved at the post-transcriptional level by the action of common regulatory molecules, such as RNA-binding proteins (RBPs). Despite advances in the genome-wide identification of RBPs and their binding transcripts, the protein-RNA interaction space is still largely unexplored, thus hindering a broader understanding of the extent of the post-transcriptional regulation of related coding RNAs. Here, we propose a computational approach that combines protein-mRNA interaction networks and statistical analyses to provide an inferred regulatory landscape for more than 800 human RBPs and identify the cellular processes that can be regulated at the post-transcriptional level. We show that 10% of the tested sets of functionally-related mRNAs can be post-transcriptionally regulated. Moreover, we propose a classification of (i) the RBPs and (ii) the functionally-related mRNAs, based on their distinct behaviors in the functional landscape, hinting towards mechanistic regulatory hypotheses. In addition, we demonstrate the usefulness of the inferred functional landscape to investigate the cellular role of both well-characterized and novel RBPs in the context of human diseases.
Assuntos
Processamento Pós-Transcricional do RNA , Proteínas de Ligação a RNA/metabolismo , Regulação da Expressão Gênica , Humanos , Mapas de Interação de Proteínas , RNA Mensageiro/fisiologia , Regulon , TranscriptomaRESUMO
MoonDB 2.0 (http://moondb.hb.univ-amu.fr/) is a database of predicted and manually curated extreme multifunctional (EMF) and moonlighting proteins, i.e. proteins that perform multiple unrelated functions. We have previously shown that such proteins can be predicted through the analysis of their molecular interaction subnetworks, their functional annotations and their association to distinct groups of proteins that are involved in unrelated functions. In MoonDB 2.0, we updated the set of human EMF proteins (238 proteins), using the latest functional annotations and protein-protein interaction networks. Furthermore, for the first time, we applied our method to four additional model organisms - mouse, fly, worm and yeast - and identified 54 novel EMF proteins in these species. In addition to novel predictions, this update contains 63 human and yeast proteins that were manually curated from literature, including descriptions of moonlighting functions and associated references. Importantly, MoonDB's interface was fully redesigned and improved, and its entries are now cross-referenced in the UniProt Knowledgebase (UniProtKB). MoonDB will be updated once a year with the novel EMF candidates calculated from the latest available protein interactions and functional annotations.
Assuntos
Bases de Dados de Proteínas , Animais , Caenorhabditis elegans/genética , Curadoria de Dados , Drosophila melanogaster/genética , Ontologia Genética , Humanos , Camundongos , Anotação de Sequência Molecular , Mapeamento de Interação de Proteínas , Interface Usuário-Computador , Leveduras/genéticaRESUMO
The human transcriptome contains thousands of long non-coding RNAs (lncRNAs). Characterizing their function is a current challenge. An emerging concept is that lncRNAs serve as protein scaffolds, forming ribonucleoproteins and bringing proteins in proximity. However, only few scaffolding lncRNAs have been characterized and the prevalence of this function is unknown. Here, we propose the first computational approach aimed at predicting scaffolding lncRNAs at large scale. We predicted the largest human lncRNA-protein interaction network to date using the catRAPID omics algorithm. In combination with tissue expression and statistical approaches, we identified 847 lncRNAs (â¼5% of the long non-coding transcriptome) predicted to scaffold half of the known protein complexes and network modules. Lastly, we show that the association of certain lncRNAs to disease may involve their scaffolding ability. Overall, our results suggest for the first time that RNA-mediated scaffolding of protein complexes and modules may be a common mechanism in human cells.
Assuntos
Biologia Computacional/métodos , RNA Longo não Codificante/metabolismo , Proteínas de Ligação a RNA/metabolismo , Ribonucleoproteínas/metabolismo , Algoritmos , Predisposição Genética para Doença/genética , Humanos , Ligação Proteica , Mapas de Interação de Proteínas , Proteoma/genética , Proteoma/metabolismo , RNA Longo não Codificante/genética , Proteínas de Ligação a RNA/genética , Ribonucleoproteínas/genética , TranscriptomaRESUMO
Schistosomes are parasitic helminths that cause schistosomiasis, a disease affecting circa 200 million people, primarily in underprivileged regions of the world. Schistosoma mansoni is the most experimentally tractable schistosome species due to its ease of propagation in the laboratory and the high quality of its genome assembly and annotation. Although there is growing interest in microRNAs (miRNAs) in trematodes, little is known about the role these molecules play in the context of developmental processes. We use the completely unaware "miRNA-blind" bioinformatics tool Sylamer to analyse the 3'-UTRs of transcripts differentially expressed between the juvenile and adult stages. We show that the miR-277/4989 family target sequence is the only one significantly enriched in the transition from juvenile to adult worms. Further, we describe a novel miRNA, sma-miR-4989 showing that its proximal genomic location to sma-miR-277 suggests that they form a miRNA cluster, and we propose hairpin folds for both miRNAs compatible with the miRNA pathway. In addition, we found that expression of sma-miR-277/4989 miRNAs are up-regulated in adults while their predicted targets are characterised by significant down-regulation in paired adult worms but remain largely undisturbed in immature "virgin" females. Finally, we show that sma-miR-4989 is expressed in tegumental cells located proximal to the oesophagus gland and also distributed throughout the male worms' body. Our results indicate that sma-miR-277/4989 might play a dominant role in post-transcriptional regulation during development of juvenile worms and suggest an important role in the sexual development of female schistosomes.
Assuntos
Regulação da Expressão Gênica , MicroRNAs/metabolismo , Schistosoma mansoni/crescimento & desenvolvimento , Schistosoma mansoni/genética , Transcrição Gênica , Animais , Biologia Computacional , Feminino , Masculino , Camundongos Endogâmicos BALB C , MicroRNAs/genéticaRESUMO
Soil-transmitted nematodes, including the Strongyloides genus, cause one of the most prevalent neglected tropical diseases. Here we compare the genomes of four Strongyloides species, including the human pathogen Strongyloides stercoralis, and their close relatives that are facultatively parasitic (Parastrongyloides trichosuri) and free-living (Rhabditophanes sp. KR3021). A significant paralogous expansion of key gene families--families encoding astacin-like and SCP/TAPS proteins--is associated with the evolution of parasitism in this clade. Exploiting the unique Strongyloides life cycle, we compare the transcriptomes of the parasitic and free-living stages and find that these same gene families are upregulated in the parasitic stages, underscoring their role in nematode parasitism.
Assuntos
Genômica , Strongyloides/genética , Estrongiloidíase/genética , Simbiose/genética , Animais , Evolução Biológica , Humanos , Estágios do Ciclo de Vida/genética , Strongyloides/patogenicidade , Estrongiloidíase/parasitologia , Transcriptoma/genéticaRESUMO
BACKGROUND: Sparganosis is an infection with a larval Diphyllobothriidea tapeworm. From a rare cerebral case presented at a clinic in the UK, DNA was recovered from a biopsy sample and used to determine the causative species as Spirometra erinaceieuropaei through sequencing of the cox1 gene. From the same DNA, we have produced a draft genome, the first of its kind for this species, and used it to perform a comparative genomics analysis and to investigate known and potential tapeworm drug targets in this tapeworm. RESULTS: The 1.26 Gb draft genome of S. erinaceieuropaei is currently the largest reported for any flatworm. Through investigation of ß-tubulin genes, we predict that S. erinaceieuropaei larvae are insensitive to the tapeworm drug albendazole. We find that many putative tapeworm drug targets are also present in S. erinaceieuropaei, allowing possible cross application of new drugs. In comparison to other sequenced tapeworm species we observe expansion of protease classes, and of Kuntiz-type protease inhibitors. Expanded gene families in this tapeworm also include those that are involved in processes that add post-translational diversity to the protein landscape, intracellular transport, transcriptional regulation and detoxification. CONCLUSIONS: The S. erinaceieuropaei genome begins to give us insight into an order of tapeworms previously uncharacterized at the genome-wide level. From a single clinical case we have begun to sketch a picture of the characteristics of these organisms. Finally, our work represents a significant technological achievement as we present a draft genome sequence of a rare tapeworm, and from a small amount of starting material.
Assuntos
Diphyllobothrium/genética , Genoma , Esparganose/genética , Spirometra/genética , Animais , Sequência de Bases , Biópsia , Encéfalo/parasitologia , Encéfalo/patologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Esparganose/parasitologia , Spirometra/parasitologia , Reino UnidoRESUMO
In this work, the influence of imidazolium ionic liquids (ILs) on bio-chemical parameters that influence the in vivo behavior of nimesulide was evaluated. In this context, the binding of nimesulide to human serum albumin (HSA), in IL media, was studied. In parallel, the evaluation of the interaction of drug-IL systems, with micelles of hexadecylphosphocholine (HDPC), enabled the calculation of partition coefficients (K(p)). Both assays were performed in buffered media in the absence and in the presence of emim [BF(4)], emim [Ms] and emim [TfMs] 1%. Even though there was an increase of the dissociation constant (K(d)) in IL media, nimesulide still binds to HSA by means of strong interactions. The thermodynamic analysis indicates that the interaction is spontaneous for all the tested systems. Moreover, the studied systems exhibited properties that are favorable to the interaction of the drug with biological membranes, with K(p) values 2.5-3.5 higher than in aqueous environment. The studied nimesulide-IL systems presented promising characteristics regarding the absorption and distribution of the drug in vivo, so that the studied solvents seem to be good options for drug delivery.