Pesquisa | Portal de Pesquisa da BVS

RAVAR: a curated repository for rare variant-trait associations.

Cao, Chen; Shao, Mengting; Zuo, Chunman; Kwok, Devin; Liu, Lin; Ge, Yuli; Zhang, Zilong; Cui, Feifei; Chen, Mingshuai; Fan, Rui; Ding, Yijie; Jiang, Hangjin; Wang, Guishen; Zou, Quan.

Nucleic Acids Res ; 52(D1): D990-D997, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37831073

RESUMO

Rare variants contribute significantly to the genetic causes of complex traits, as they can have much larger effects than common variants and account for much of the missing heritability in genome-wide association studies. The emergence of UK Biobank scale datasets and accurate gene-level rare variant-trait association testing methods have dramatically increased the number of rare variant associations that have been detected. However, no systematic collection of these associations has been carried out to date, especially at the gene level. To address the issue, we present the Rare Variant Association Repository (RAVAR), a comprehensive collection of rare variant associations. RAVAR includes 95 047 high-quality rare variant associations (76186 gene-level and 18 861 variant-level associations) for 4429 reported traits which are manually curated from 245 publications. RAVAR is the first resource to collect and curate published rare variant associations in an interactive web interface with integrated visualization, search, and download features. Detailed gene and SNP information are provided for each association, and users can conveniently search for related studies by exploring the EFO tree structure and interactive Manhattan plots. RAVAR could vastly improve the accessibility of rare variant studies. RAVAR is freely available for all users without login requirement at http://www.ravar.bio.

Assuntos

Bases de Dados Genéticas , Variação Genética , Estudo de Associação Genômica Ampla , Estudo de Associação Genômica Ampla/métodos , Herança Multifatorial , Fenótipo

webTWAS: a resource for disease candidate susceptibility genes identified by transcriptome-wide association study.

Cao, Chen; Wang, Jianhua; Kwok, Devin; Cui, Feifei; Zhang, Zilong; Zhao, Da; Li, Mulin Jun; Zou, Quan.

Nucleic Acids Res ; 50(D1): D1123-D1130, 2022 01 07.

Artigo em Inglês | MEDLINE | ID: mdl-34669946

RESUMO

The development of transcriptome-wide association studies (TWAS) has enabled researchers to better identify and interpret causal genes in many diseases. However, there are currently no resources providing a comprehensive listing of gene-disease associations discovered by TWAS from published GWAS summary statistics. TWAS analyses are also difficult to conduct due to the complexity of TWAS software pipelines. To address these issues, we introduce a new resource called webTWAS, which integrates a database of the most comprehensive disease GWAS datasets currently available with credible sets of potential causal genes identified by multiple TWAS software packages. Specifically, a total of 235 064 gene-diseases associations for a wide range of human diseases are prioritized from 1298 high-quality downloadable European GWAS summary statistics. Associations are calculated with seven different statistical models based on three popular and representative TWAS software packages. Users can explore associations at the gene or disease level, and easily search for related studies or diseases using the MeSH disease tree. Since the effects of diseases are highly tissue-specific, webTWAS applies tissue-specific enrichment analysis to identify significant tissues. A user-friendly web server is also available to run custom TWAS analyses on user-provided GWAS summary statistics data. webTWAS is freely available at http://www.webtwas.net.

Assuntos

Bases de Dados Genéticas , Doenças Genéticas Inatas/classificação , Predisposição Genética para Doença , Transcriptoma/genética , Perfilação da Expressão Gênica , Estudos de Associação Genética , Doenças Genéticas Inatas/genética , Estudo de Associação Genômica Ampla , Humanos , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Software

Power analysis of transcriptome-wide association study: Implications for practical protocol choice.

Cao, Chen; Ding, Bowei; Li, Qing; Kwok, Devin; Wu, Jingjing; Long, Quan.

PLoS Genet ; 17(2): e1009405, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33635859

RESUMO

The transcriptome-wide association study (TWAS) has emerged as one of several promising techniques for integrating multi-scale 'omics' data into traditional genome-wide association studies (GWAS). Unlike GWAS, which associates phenotypic variance directly with genetic variants, TWAS uses a reference dataset to train a predictive model for gene expressions, which allows it to associate phenotype with variants through the mediating effect of expressions. Although effective, this core innovation of TWAS is poorly understood, since the predictive accuracy of the genotype-expression model is generally low and further bounded by expression heritability. This raises the question: to what degree does the accuracy of the expression model affect the power of TWAS? Furthermore, would replacing predictions with actual, experimentally determined expressions improve power? To answer these questions, we compared the power of GWAS, TWAS, and a hypothetical protocol utilizing real expression data. We derived non-centrality parameters (NCPs) for linear mixed models (LMMs) to enable closed-form calculations of statistical power that do not rely on specific protocol implementations. We examined two representative scenarios: causality (genotype contributes to phenotype through expression) and pleiotropy (genotype contributes directly to both phenotype and expression), and also tested the effects of various properties including expression heritability. Our analysis reveals two main outcomes: (1) Under pleiotropy, the use of predicted expressions in TWAS is superior to actual expressions. This explains why TWAS can function with weak expression models, and shows that TWAS remains relevant even when real expressions are available. (2) GWAS outperforms TWAS when expression heritability is below a threshold of 0.04 under causality, or 0.06 under pleiotropy. Analysis of existing publications suggests that TWAS has been misapplied in place of GWAS, in situations where expression heritability is low.

Assuntos

Perfilação da Expressão Gênica/métodos , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Transcriptoma , Algoritmos , Pleiotropia Genética/genética , Genótipo , Humanos , Modelos Genéticos , Fenótipo , Locos de Características Quantitativas/genética

kTWAS: integrating kernel machine with transcriptome-wide association studies improves statistical power and reveals novel genes.

Cao, Chen; Kwok, Devin; Edie, Shannon; Li, Qing; Ding, Bowei; Kossinna, Pathum; Campbell, Simone; Wu, Jingjing; Greenberg, Matthew; Long, Quan.

Brief Bioinform ; 22(4)2021 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-33200776

RESUMO

The power of genotype-phenotype association mapping studies increases greatly when contributions from multiple variants in a focal region are meaningfully aggregated. Currently, there are two popular categories of variant aggregation methods. Transcriptome-wide association studies (TWAS) represent a set of emerging methods that select variants based on their effect on gene expressions, providing pretrained linear combinations of variants for downstream association mapping. In contrast to this, kernel methods such as sequence kernel association test (SKAT) model genotypic and phenotypic variance use various kernel functions that capture genetic similarity between subjects, allowing nonlinear effects to be included. From the perspective of machine learning, these two methods cover two complementary aspects of feature engineering: feature selection/pruning and feature aggregation. Thus far, no thorough comparison has been made between these categories, and no methods exist which incorporate the advantages of TWAS- and kernel-based methods. In this work, we developed a novel method called kernel-based TWAS (kTWAS) that applies TWAS-like feature selection to a SKAT-like kernel association test, combining the strengths of both approaches. Through extensive simulations, we demonstrate that kTWAS has higher power than TWAS and multiple SKAT-based protocols, and we identify novel disease-associated genes in Wellcome Trust Case Control Consortium genotyping array data and MSSNG (Autism) sequence data. The source code for kTWAS and our simulations are available in our GitHub repository (https://github.com/theLongLab/kTWAS).

Assuntos

Simulação por Computador , Estudos de Associação Genética , Variação Genética , Modelos Genéticos , Software , Transcriptoma , Estudo de Associação Genômica Ampla , Genótipo , Humanos

Reconstruction of Microbial Haplotypes by Integration of Statistical and Physical Linkage in Scaffolding.

Cao, Chen; He, Jingni; Mak, Lauren; Perera, Deshan; Kwok, Devin; Wang, Jia; Li, Minghao; Mourier, Tobias; Gavriliuc, Stefan; Greenberg, Matthew; Morrissy, A Sorana; Sycuro, Laura K; Yang, Guang; Jeffares, Daniel C; Long, Quan.

Mol Biol Evol ; 38(6): 2660-2672, 2021 05 19.

Artigo em Inglês | MEDLINE | ID: mdl-33547786

RESUMO

DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or "haplotypes." However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.

Assuntos

Técnicas Genéticas , Genética Microbiana/métodos , Haplótipos , Software , Algoritmos , Evolução Biológica , HIV/genética , Humanos , Plasmodium vivax/genética

EHR-HGCN: An Enhanced Hybrid Approach for Text Classification Using Heterogeneous Graph Convolutional Networks in Electronic Health Records.

Wang, Guishen; Lou, Xiaoxue; Guo, Fang; Kwok, Devin; Cao, Chen.

IEEE J Biomed Health Inform ; 28(3): 1668-1679, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38133976

RESUMO

Text classification is a central part of natural language processing, with important applications in understanding the knowledge behind biomedical texts including electronic health records (EHR). In this article, we propose a novel heterogeneous graph convolutional network method for classifying EHR texts. Our method, called EHR-HGCN, is able to combine context-sensitive word and sentence embeddings with structural sentence-level and word-level relation information to perform text classification. EHR-HGCN reframes EHR text classification as a graph classification task to better capture structural information about the document using a heterogeneous graph. To mine contextual information from a document, EHR-HGCN first applies a bidirectional recurrent neural network (BiRNN) on word embeddings obtained via Global Vectors for word representation (GloVe) to obtain context-sensitive word-level and sentence-level embeddings. To mine structural relationships from the document, EHR-HGCN then constructs a heterogeneous graph over the word and sentence embeddings, where sentence-word and word-word relationships are represented by graph edges. Finally, a heterogeneous graph convolutional neural network is used to classify documents by their graph representation. We evaluate EHR-HGCN on a variety of standard text classification benchmarks and find that EHR-HGCN has higher accuracy and F1-score than other representative machine learning and deep learning methods. We also apply EHR-HGCN to the MedLit benchmark and find it performs with high accuracy and F1-score on the task of section classification in EHR texts. Our ablation experiments show that the heterogeneous graph construction and heterogeneous graph convolutional network are critical to the performance of EHR-HGCN.

Assuntos

Registros Eletrônicos de Saúde , Redes Neurais de Computação , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural

Disentangling genetic feature selection and aggregation in transcriptome-wide association studies.

Cao, Chen; Kossinna, Pathum; Kwok, Devin; Li, Qing; He, Jingni; Su, Liya; Guo, Xingyi; Zhang, Qingrun; Long, Quan.

Genetics ; 220(2)2022 02 04.

Artigo em Inglês | MEDLINE | ID: mdl-34849857

RESUMO

The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps-feature selection and feature aggregation-which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.

Assuntos

Estudo de Associação Genômica Ampla , Transcriptoma , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA