Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-37001506

RESUMO

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Assuntos
Epigenoma , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla , Genômica , Fenótipo , Polimorfismo de Nucleotídeo Único
2.
Nucleic Acids Res ; 52(D1): D1033-D1041, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37904591

RESUMO

The brain is constituted of heterogeneous types of neuronal and non-neuronal cells, which are organized into distinct anatomical regions, and show precise regulation of gene expression during development, aging and function. In the current database release, STAB2 provides a systematic cellular map of the human and mouse brain by integrating recently published large-scale single-cell and single-nucleus RNA-sequencing datasets from diverse regions and across lifespan. We applied a hierarchical strategy of unsupervised clustering on the integrated single-cell transcriptomic datasets to precisely annotate the cell types and subtypes in the human and mouse brain. Currently, STAB2 includes 71 and 61 different cell subtypes defined in the human and mouse brain, respectively. It covers 63 subregions and 15 developmental stages of human brain, and 38 subregions and 30 developmental stages of mouse brain, generating a comprehensive atlas for exploring spatiotemporal transcriptomic dynamics in the mammalian brain. We also augmented web interfaces for querying and visualizing the gene expression in specific cell types. STAB2 is freely available at https://mai.fudan.edu.cn/stab2.


Assuntos
Encéfalo , Bases de Dados Genéticas , Neurônios , Análise da Expressão Gênica de Célula Única , Animais , Humanos , Camundongos , Atlas como Assunto , Encéfalo/citologia , Encéfalo/crescimento & desenvolvimento , Encéfalo/metabolismo , Neurônios/metabolismo , Transcriptoma , Conjuntos de Dados como Assunto
3.
PLoS Genet ; 19(12): e1011112, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38150468

RESUMO

Mendelian randomization (MR) is an effective approach for revealing causal risk factors that underpin complex traits and diseases. While MR has been more widely applied under two-sample settings, it is more promising to be used in one single large cohort given the rise of biobank-scale datasets that simultaneously contain genotype data, brain imaging data, and matched complex traits from the same individual. However, most existing multivariable MR methods have been developed for two-sample setting or a small number of exposures. In this study, we introduce a one-sample multivariable MR method based on partial least squares and Lasso regression (MR-PL). MR-PL is capable of considering the correlation among exposures (e.g., brain imaging features) when the number of exposures is extremely upscaled, while also correcting for winner's curse bias. We performed extensive and systematic simulations, and demonstrated the robustness and reliability of our method. Comprehensive simulations confirmed that MR-PL can generate more precise causal estimates with lower false positive rates than alternative approaches. Finally, we applied MR-PL to the datasets from UK Biobank to reveal the causal effects of 36 white matter tracts on 180 complex traits, and showed putative white matter tracts that are implicated in smoking, blood vascular function-related traits, and eating behaviors.


Assuntos
Bancos de Espécimes Biológicos , Análise da Randomização Mendeliana , Humanos , Análise da Randomização Mendeliana/métodos , Herança Multifatorial , Reprodutibilidade dos Testes , Neuroimagem , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único
4.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36847697

RESUMO

Brain imaging genomics is an emerging interdisciplinary field, where integrated analysis of multimodal medical image-derived phenotypes (IDPs) and multi-omics data, bridging the gap between macroscopic brain phenotypes and their cellular and molecular characteristics. This approach aims to better interpret the genetic architecture and molecular mechanisms associated with brain structure, function and clinical outcomes. More recently, the availability of large-scale imaging and multi-omics datasets from the human brain has afforded the opportunity to the discovering of common genetic variants contributing to the structural and functional IDPs of the human brain. By integrative analyses with functional multi-omics data from the human brain, a set of critical genes, functional genomic regions and neuronal cell types have been identified as significantly associated with brain IDPs. Here, we review the recent advances in the methods and applications of multi-omics integration in brain imaging analysis. We highlight the importance of functional genomic datasets in understanding the biological functions of the identified genes and cell types that are associated with brain IDPs. Moreover, we summarize well-known neuroimaging genetics datasets and discuss challenges and future directions in this field.


Assuntos
Encéfalo , Genômica , Humanos , Genômica/métodos , Encéfalo/diagnóstico por imagem , Encéfalo/metabolismo , Fenótipo , Neuroimagem/métodos
5.
PLoS Comput Biol ; 19(7): e1011222, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37410793

RESUMO

The COVID-19 pandemic caused by the SARS-CoV-2 virus has resulted in millions of deaths worldwide. The disease presents with various manifestations that can vary in severity and long-term outcomes. Previous efforts have contributed to the development of effective strategies for treatment and prevention by uncovering the mechanism of viral infection. We now know all the direct protein-protein interactions that occur during the lifecycle of SARS-CoV-2 infection, but it is critical to move beyond these known interactions to a comprehensive understanding of the "full interactome" of SARS-CoV-2 infection, which incorporates human microRNAs (miRNAs), additional human protein-coding genes, and exogenous microbes. Potentially, this will help in developing new drugs to treat COVID-19, differentiating the nuances of long COVID, and identifying histopathological signatures in SARS-CoV-2-infected organs. To construct the full interactome, we developed a statistical modeling approach called MLCrosstalk (multiple-layer crosstalk) based on latent Dirichlet allocation. MLCrosstalk integrates data from multiple sources, including microbes, human protein-coding genes, miRNAs, and human protein-protein interactions. It constructs "topics" that group SARS-CoV-2 with genes and microbes based on similar patterns of co-occurrence across patient samples. We use these topics to infer linkages between SARS-CoV-2 and protein-coding genes, miRNAs, and microbes. We then refine these initial linkages using network propagation to contextualize them within a larger framework of network and pathway structures. Using MLCrosstalk, we identified genes in the IL1-processing and VEGFA-VEGFR2 pathways that are linked to SARS-CoV-2. We also found that Rothia mucilaginosa and Prevotella melaninogenica are positively and negatively correlated with SARS-CoV-2 abundance, a finding corroborated by analysis of single-cell sequencing data.


Assuntos
COVID-19 , MicroRNAs , Humanos , SARS-CoV-2/genética , Síndrome de COVID-19 Pós-Aguda , Pandemias/prevenção & controle , MicroRNAs/genética
6.
Nucleic Acids Res ; 50(D1): D287-D294, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34403477

RESUMO

RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation. Accurate identification of RBP binding sites in multiple cell lines and tissue types from diverse species is a fundamental endeavor towards understanding the regulatory mechanisms of RBPs under both physiological and pathological conditions. Our POSTAR annotation processes make use of publicly available large-scale CLIP-seq datasets and external functional genomic annotations to generate a comprehensive map of RBP binding sites and their association with other regulatory events as well as functional variants. Here, we present POSTAR3, an updated database with improvements in data collection, annotation infrastructure, and analysis that support the annotation of post-transcriptional regulation in multiple species including: we made a comprehensive update on the CLIP-seq and Ribo-seq datasets which cover more biological conditions, technologies, and species; we added RNA secondary structure profiling for RBP binding sites; we provided miRNA-mediated degradation events validated by degradome-seq; we included RBP binding sites at circRNA junction regions; we expanded the annotation of RBP binding sites, particularly using updated genomic variants and mutations associated with diseases. POSTAR3 is freely available at http://postar.ncrnalab.org.


Assuntos
Bases de Dados Genéticas , MicroRNAs/genética , Processamento Pós-Transcricional do RNA , RNA Circular/genética , Proteínas de Ligação a RNA/genética , Software , Animais , Arabidopsis/genética , Arabidopsis/metabolismo , Sítios de Ligação , Linhagem Celular , Conjuntos de Dados como Assunto , Humanos , Internet , MicroRNAs/classificação , MicroRNAs/metabolismo , Anotação de Sequência Molecular , Conformação de Ácido Nucleico , RNA Circular/classificação , RNA Circular/metabolismo , Proteínas de Ligação a RNA/classificação , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de RNA
7.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33270111

RESUMO

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Assuntos
COVID-19/prevenção & controle , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , SARS-CoV-2/genética , Animais , COVID-19/epidemiologia , COVID-19/virologia , Epidemias , Humanos , Internet , Camundongos , Pseudogenes/genética , RNA Longo não Codificante/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Transcrição Gênica/genética
8.
BMC Med ; 20(1): 266, 2022 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-36031604

RESUMO

BACKGROUND: Alzheimer's disease (AD), a progressive neurodegenerative disease, is the most common cause of dementia worldwide. Accumulating data support the contributions of the peripheral immune system in AD pathogenesis. However, there is a lack of comprehensive understanding about the molecular characteristics of peripheral immune cells in AD. METHODS: To explore the alterations of cellular composition and the alterations of intrinsic expression of individual cell types in peripheral blood, we performed cellular deconvolution in a large-scale bulk blood expression cohort and identified cell-intrinsic differentially expressed genes in individual cell types with adjusting for cellular proportion. RESULTS: We detected a significant increase and decrease in the proportion of neutrophils and B lymphocytes in AD blood, respectively, which had a robust replicability across other three AD cohorts, as well as using alternative algorithms. The differentially expressed genes in AD neutrophils were enriched for some AD-associated pathways, such as ATP metabolic process and mitochondrion organization. We also found a significant enrichment of protein-protein interaction network modules of leukocyte cell-cell activation, mitochondrion organization, and cytokine-mediated signaling pathway in neutrophils for AD risk genes including CD33 and IL1B. Both changes in cellular composition and expression levels of specific genes were significantly associated with the clinical and pathological alterations. A similar pattern of perturbations on the cellular proportion and gene expression levels of neutrophils could be also observed in mild cognitive impairment (MCI). Moreover, we noticed an elevation of neutrophil abundance in the AD brains. CONCLUSIONS: We revealed the landscape of molecular perturbations at the cellular level for AD. These alterations highlight the putative roles of neutrophils in AD pathobiology.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Doenças Neurodegenerativas , Encéfalo , Estudos de Coortes , Humanos
9.
PLoS Comput Biol ; 16(11): e1008291, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33253214

RESUMO

Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network (3D-CNN) designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used Ssym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between Ssym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D-CNNs can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.


Assuntos
Imageamento Tridimensional/métodos , Redes Neurais de Computação , Mutação Puntual , Proteínas/química , Termodinâmica , Biologia Computacional , Estabilidade Proteica
10.
Nucleic Acids Res ; 47(D1): D203-D211, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30239819

RESUMO

Post-transcriptional regulation of RNAs is critical to the diverse range of cellular processes. The volume of functional genomic data focusing on post-transcriptional regulation logics continues to grow in recent years. In the current database version, POSTAR2 (http://lulab.life.tsinghua.edu.cn/postar), we included the following new features and data: updated ∼500 CLIP-seq datasets (∼1200 CLIP-seq datasets in total) from six species, including human, mouse, fly, worm, Arabidopsis and yeast; added a new module 'Translatome', which is derived from Ribo-seq datasets and contains ∼36 million open reading frames (ORFs) in the genomes from the six species; updated and unified post-transcriptional regulation and variation data. Finally, we improved web interfaces for searching and visualizing protein-RNA interactions with multi-layer information. Meanwhile, we also merged our CLIPdb database into POSTAR2. POSTAR2 will help researchers investigate the post-transcriptional regulatory logics coordinated by RNA-binding proteins and translational landscape of cellular RNAs.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Regulação da Expressão Gênica , Processamento Pós-Transcricional do RNA , Animais , Sítios de Ligação , Biologia Computacional/métodos , Humanos , Imunoprecipitação , Anotação de Sequência Molecular , Fases de Leitura Aberta , Ligação Proteica , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de DNA , Navegador
11.
Nucleic Acids Res ; 46(D1): D194-D201, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29040625

RESUMO

We present RISE (http://rise.zhanglab.net), a database of RNA Interactome from Sequencing Experiments. RNA-RNA interactions (RRIs) are essential for RNA regulation and function. RISE provides a comprehensive collection of RRIs that mainly come from recent transcriptome-wide sequencing-based experiments like PARIS, SPLASH, LIGR-seq, and MARIO, as well as targeted studies like RIA-seq, RAP-RNA and CLASH. It also includes interactions aggregated from other primary databases and publications. The RISE database currently contains 328,811 RNA-RNA interactions mainly in human, mouse and yeast. While most existing RNA databases mainly contain interactions of miRNA targeting, notably, more than half of the RRIs in RISE are among mRNA and long non-coding RNAs. We compared different RRI datasets in RISE and found limited overlaps in interactions resolved by different techniques and in different cell lines. It may suggest technology preference and also dynamic natures of RRIs. We also analyzed the basic features of the human and mouse RRI networks and found that they tend to be scale-free, small-world, hierarchical and modular. The analysis may nominate important RNAs or RRIs for further investigation. Finally, RISE provides a Circos plot and several table views for integrative visualization, with extensive molecular and functional annotations to facilitate exploration of biological functions for any RRI of interest.


Assuntos
Bases de Dados de Ácidos Nucleicos , Animais , Redes Reguladoras de Genes , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Anotação de Sequência Molecular , Mapas de Interação de Proteínas , RNA/genética , RNA/metabolismo , Análise de Sequência de RNA , Transcriptoma , Interface Usuário-Computador
12.
Genome Med ; 15(1): 56, 2023 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-37488639

RESUMO

BACKGROUND: Prioritizing genes that underlie complex brain disorders poses a considerable challenge. Despite previous studies have found that they shared symptoms and heterogeneity, it remained difficult to systematically identify the risk genes associated with them. METHODS: By using the CAGE (Cap Analysis of Gene Expression) read alignment files for 439 human cell and tissue types (including primary cells, tissues and cell lines) from FANTOM5 project, we predicted enhancer-promoter interactions (EPIs) of 439 cell and tissue types in human, and examined their reliability. Then we evaluated the genetic heritability of 17 diverse brain disorders and behavioral-cognitive phenotypes in each neural cell type, brain region, and developmental stage. Furthermore, we prioritized genes associated with brain disorders and phenotypes by leveraging the EPIs in each neural cell and tissue type, and analyzed their pleiotropy and functionality for different categories of disorders and phenotypes. Finally, we characterized the spatiotemporal expression dynamics of these associated genes in cells and tissues. RESULTS: We found that identified EPIs showed activity specificity and network aggregation in cell and tissue types, and enriched TF binding in neural cells played key roles in synaptic plasticity and nerve cell development, i.e., EGR1 and SOX family. We also discovered that most neurological disorders exhibit heritability enrichment in neural stem cells and astrocytes, while psychiatric disorders and behavioral-cognitive phenotypes exhibit enrichment in neurons. Furthermore, our identified genes recapitulated well-known risk genes, which exhibited widespread pleiotropy between psychiatric disorders and behavioral-cognitive phenotypes (i.e., FOXP2), and indicated expression specificity in neural cell types, brain regions, and developmental stages associated with disorders and phenotypes. Importantly, we showed the potential associations of brain disorders with brain regions and developmental stages that have not been well studied. CONCLUSIONS: Overall, our study characterized the gene-enhancer regulatory networks and genetic mechanisms in the human neural cells and tissues, and illustrated the value of reanalysis of publicly available genomic datasets.


Assuntos
Encefalopatias , Humanos , Reprodutibilidade dos Testes , Regiões Promotoras Genéticas , Neurônios , Redes Reguladoras de Genes
13.
Front Psychol ; 12: 641333, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33995194

RESUMO

This paper explores whether a Student's choice of major leads to certain personality traits and the reasons for this phenomenon. Specifically, we look at evidence from two Chinese universities, both of which specialize in agricultural studies. Using the Sixteen Personality Factor (16PF) questionnaire and the Neuroticism Extraversion Openness Five-Factor Inventory (NEO-FFI) questionnaire, we collected data from two groups of students: those who study agriculture-related majors (ARM), and those who study non-agriculture-related majors (NARM). The surveys all showed no significant change in personality traits during Students' freshman year. However, after 3 years of university study, significant personality trait changes were noted between seniors in the ARM and NARM groups. Whereas ARM seniors tended to be socially shy and lower in communicative competence, NARM seniors were better at expressing themselves and communicating with others. Although a Student's choice of profession has an influence on their personality traits, it is not the only factor. The differences between ARM and NARM training models and curricula are also undoubtedly significant. Moreover, the bias against ARM in Chinese society further magnifies the differences in personality traits among students with different majors.

14.
Cell Res ; 31(5): 495-516, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33623109

RESUMO

Interactions with RNA-binding proteins (RBPs) are integral to RNA function and cellular regulation, and dynamically reflect specific cellular conditions. However, presently available tools for predicting RBP-RNA interactions employ RNA sequence and/or predicted RNA structures, and therefore do not capture their condition-dependent nature. Here, after profiling transcriptome-wide in vivo RNA secondary structures in seven cell types, we developed PrismNet, a deep learning tool that integrates experimental in vivo RNA structure data and RBP binding data for matched cells to accurately predict dynamic RBP binding in various cellular conditions. PrismNet results for 168 RBPs support its utility for both understanding CLIP-seq results and largely extending such interaction data to accurately analyze additional cell types. Further, PrismNet employs an "attention" strategy to computationally identify exact RBP-binding nucleotides, and we discovered enrichment among dynamic RBP-binding sites for structure-changing variants (riboSNitches), which can link genetic diseases with dysregulated RBP bindings. Our rich profiling data and deep learning-based prediction tool provide access to a previously inaccessible layer of cell-type-specific RBP-RNA interactions, with clear utility for understanding and treating human diseases.


Assuntos
Aprendizado Profundo , RNA , Sítios de Ligação , Humanos , Ligação Proteica , RNA/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Transcriptoma
15.
Nat Commun ; 11(1): 3696, 2020 07 29.
Artigo em Inglês | MEDLINE | ID: mdl-32728046

RESUMO

ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.


Assuntos
Bases de Dados Genéticas , Genômica , Neoplasias/genética , Linhagem Celular Tumoral , Transformação Celular Neoplásica/genética , Redes Reguladoras de Genes , Humanos , Mutação/genética , Reprodutibilidade dos Testes , Fatores de Transcrição/metabolismo
16.
Structure ; 27(9): 1469-1481.e3, 2019 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-31279629

RESUMO

A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.


Assuntos
Biologia Computacional/métodos , Polimorfismo de Nucleotídeo Único , Proteínas/química , Proteínas/genética , Bases de Dados de Proteínas , Desenho de Fármacos , Humanos , Ligantes , Aprendizado de Máquina , Modelos Estatísticos , Simulação de Acoplamento Molecular , Ligação Proteica , Conformação Proteica , Proteínas/metabolismo
17.
Curr Protoc Bioinformatics ; 64(1): e58, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30408350

RESUMO

RNA-RNA interactions (RRIs) are essential to understanding the regulatory mechanisms of RNAs. Mapping RRIs in vivo in a transcriptome-wide manner remained challenging until the recent development of several sequencing-based technologies. However, RRIs generated from large-scale studies had not been systematically collected and analyzed before. This article introduces RISE, a database of the RNA Interactome from Sequencing Experiments. RISE provides a comprehensive collection of RRIs in human, mouse, and yeast, derived from transcriptome-wide sequencing experiments, as well as targeted sequencing studies and other public databases/datasets. To facilitate better understanding of the biological roles of these RRIs, RISE also offers rich functional annotations involving RNAs, and an interactive interface to explore the analysis results. Here, we provide a brief description of the RISE website and a step-by-step protocol for using RISE to study RRIs. © 2018 by John Wiley & Sons, Inc.


Assuntos
Bases de Dados Genéticas , RNA/metabolismo , Análise de Sequência de RNA , Anotação de Sequência Molecular , Mutação/genética , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de RNA/métodos
18.
Science ; 362(6420)2018 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-30545856

RESUMO

Most genetic risk for psychiatric disease lies in regulatory regions, implicating pathogenic dysregulation of gene expression and splicing. However, comprehensive assessments of transcriptomic organization in diseased brains are limited. In this work, we integrated genotypes and RNA sequencing in brain samples from 1695 individuals with autism spectrum disorder (ASD), schizophrenia, and bipolar disorder, as well as controls. More than 25% of the transcriptome exhibits differential splicing or expression, with isoform-level changes capturing the largest disease effects and genetic enrichments. Coexpression networks isolate disease-specific neuronal alterations, as well as microglial, astrocyte, and interferon-response modules defining previously unidentified neural-immune mechanisms. We integrated genetic and genomic data to perform a transcriptome-wide association study, prioritizing disease loci likely mediated by cis effects on brain expression. This transcriptome-wide characterization of the molecular pathology across three major psychiatric disorders provides a comprehensive resource for mechanistic insight and therapeutic development.


Assuntos
Transtorno do Espectro Autista/genética , Transtorno Bipolar/genética , Predisposição Genética para Doença , Splicing de RNA , Esquizofrenia/genética , Encéfalo/metabolismo , Humanos , Isoformas de Proteínas/genética , Análise de Sequência de RNA , Transcriptoma
19.
Science ; 362(6420)2018 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-30545857

RESUMO

Despite progress in defining genetic risk for psychiatric disorders, their molecular mechanisms remain elusive. Addressing this, the PsychENCODE Consortium has generated a comprehensive online resource for the adult brain across 1866 individuals. The PsychENCODE resource contains ~79,000 brain-active enhancers, sets of Hi-C linkages, and topologically associating domains; single-cell expression profiles for many cell types; expression quantitative-trait loci (QTLs); and further QTLs associated with chromatin, splicing, and cell-type proportions. Integration shows that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). It also allows building of a gene regulatory network, linking genome-wide association study variants to genes (e.g., 321 for schizophrenia). We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.


Assuntos
Encéfalo/metabolismo , Regulação da Expressão Gênica , Transtornos Mentais/genética , Conjuntos de Dados como Assunto , Aprendizado Profundo , Elementos Facilitadores Genéticos , Epigênese Genética , Epigenômica , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Humanos , Locos de Características Quantitativas , Análise de Célula Única , Transcriptoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA