Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 93
Filtrar
1.
Am J Hum Genet ; 110(9): 1534-1548, 2023 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-37633278

RESUMO

Despite extensive research on global heritability estimation for complex traits, few methods accurately dissect local heritability. A precise local heritability estimate is crucial for high-resolution mapping in genetics. Here, we report the effective heritability estimator (EHE) that can use p values from genome-wide association studies (GWASs) for local heritability estimation by directly converting marginal heritability estimates of SNPs to a non-redundant heritability estimate of a gene or a small genomic region. EHE provides higher accuracy and precision for local heritability estimation among seven compared methods. Importantly, EHE can be applied to estimate the conditional heritability of nearby genes, where redundant heritability among the genes can also be removed further. The conditional estimation can be guided by tissue-specific expression profiles (or other functional scores) to prioritize and quantify more functionally important genes of complex phenotypes. Applying EHE to 42 complex phenotypes from the UK Biobank, we revealed the existence of two types of distinct genetic architectures for various complex phenotypes and found that highly pleiotropic genes are not enriched for more heritability compared to other candidate susceptibility genes. EHE provides an accurate and robust way to dissect the genetic architecture of complex phenotypes.


Assuntos
Estudo de Associação Genômica Ampla , Genômica , Herança Multifatorial/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
2.
Am J Hum Genet ; 109(5): 838-856, 2022 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-35460606

RESUMO

Isolating the causal genes from numerous genetic association signals in genome-wide association studies (GWASs) of complex phenotypes remains an open and challenging question. In the present study, we proposed a statistical approach, the effective-median-based Mendelian randomization (MR) framework, for inferring the causal genes of complex phenotypes with the GWAS summary statistics (named EMIC). The effective-median method solved the high false-positive issue in the existing MR methods due to either correlation among instrumental variables or noises in approximated linkage disequilibrium (LD). EMIC can further perform a pleiotropy fine-mapping analysis to remove possible false-positive estimates. With the usage of multiple cis-expression quantitative trait loci (eQTLs), EMIC was also more powerful than the alternative methods for the causal gene inference in the simulated datasets. Furthermore, EMIC rediscovered many known causal genes of complex phenotypes (schizophrenia, bipolar disorder, and total cholesterol) and reported many new and promising candidate causal genes. In sum, this study provided an efficient solution to discriminate the candidate causal genes from vast amounts of GWAS signals with eQTLs. EMIC has been implemented in our integrative software platform KGGSEE.


Assuntos
Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Estudo de Associação Genômica Ampla/métodos , Humanos , Desequilíbrio de Ligação , Análise da Randomização Mendeliana/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética
3.
Mol Psychiatry ; 28(7): 2913-2921, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37340172

RESUMO

Clinical epidemiological studies have found high co-occurrence between suicide attempts (SA) and opioid use disorder (OUD). However, the patterns of correlation and causation between them are still not clear due to psychiatric confounding. To investigate their cross-phenotype relationship, we utilized raw phenotypes and genotypes from >150,000 UK Biobank samples, and genome-wide association summary statistics from >600,000 individuals with European ancestry. Pairwise association and a potential bidirectional relationship between OUD and SA were evaluated with and without controlling for major psychiatric disease status (e.g., schizophrenia, major depressive disorder, and alcohol use disorder). Multiple statistical and genetics tools were used to perform epidemiological association, genetic correlation, polygenic risk score prediction, and Mendelian randomizations (MR) analyses. Strong associations between OUD and SA were observed at both the phenotypic level (overall samples [OR = 2.94, P = 1.59 ×10-14]; non-psychiatric subgroup [OR = 2.15, P = 1.07 ×10-3]) and the genetic level (genetic correlation rg = 0.38 and 0.5 with or without conditioning on psychiatric traits, respectively). Consistently, increasing polygenic susceptibility to SA is associated with increasing risk of OUD (OR = 1.08, false discovery rate [FDR] =1.71 ×10-3), and similarly, increasing polygenic susceptibility to OUD is associated with increasing risk of SA (OR = 1.09, FDR = 1.73 ×10-6). However, these polygenic associations were much attenuated after controlling for comorbid psychiatric diseases. A combination of MR analyses suggested a possible causal association from genetic liability for SA to OUD risk (2-sample univariable MR: OR = 1.14, P = 0.001; multivariable MR: OR = 1.08, P = 0.001). This study provided new genetic evidence to explain the observed OUD-SA comorbidity. Future prevention strategies for each phenotype needs to take into consideration of screening for the other one.


Assuntos
Transtorno Depressivo Maior , Tentativa de Suicídio , Humanos , Transtorno Depressivo Maior/genética , Transtorno Depressivo Maior/psicologia , Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Fenótipo
4.
Nucleic Acids Res ; 50(W1): W568-W576, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35639771

RESUMO

Most complex disease-associated loci mapped by genome-wide association studies (GWAS) are located in non-coding regions. It remains elusive which genes the associated loci regulate and in which tissues/cell types the regulation occurs. Here, we present PCGA (https://pmglab.top/pcga), a comprehensive web server for jointly estimating both associated tissues/cell types and susceptibility genes for complex phenotypes by GWAS summary statistics. The web server is built on our published method, DESE, which represents an effective method to mutually estimate driver tissues and genes by integrating GWAS summary statistics and transcriptome data. By collecting and processing extensive bulk and single-cell RNA sequencing datasets, PCGA has included expression profiles of 54 human tissues, 2,214 human cell types and 4,384 mouse cell types, which provide the basis for estimating associated tissues/cell types and genes for complex phenotypes. We develop a framework to sequentially estimate associated tissues and cell types of a complex phenotype according to their hierarchical relationships we curated. Meanwhile, we construct a phenotype-cell-gene association landscape by estimating the associated tissues/cell types and genes of 1,871 public GWASs. The association landscape is generally consistent with biological knowledge and can be searched and browsed at the PCGA website.


Assuntos
Células , Computadores , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Internet , Fenótipo , Software , Animais , Humanos , Camundongos , Estudo de Associação Genômica Ampla/métodos , Transcriptoma , Células/metabolismo , Especificidade de Órgãos
5.
Nucleic Acids Res ; 50(6): e34, 2022 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-34931221

RESUMO

Identifying rare variants that contribute to complex diseases is challenging because of the low statistical power in current tests comparing cases with controls. Here, we propose a novel and powerful rare variants association test based on the deviation of the observed mutation burden of a gene in cases from a baseline predicted by a weighted recursive truncated negative-binomial regression (RUNNER) on genomic features available from public data. Simulation studies show that RUNNER is substantially more powerful than state-of-the-art rare variant association tests and has reasonable type 1 error rates even for stratified populations or in small samples. Applied to real case-control data, RUNNER recapitulates known genes of Hirschsprung disease and Alzheimer's disease missed by current methods and detects promising new candidate genes for both disorders. In a case-only study, RUNNER successfully detected a known causal gene of amyotrophic lateral sclerosis. The present study provides a powerful and robust method to identify susceptibility genes with rare risk variants for complex diseases.


Assuntos
Predisposição Genética para Doença , Variação Genética , Modelos Genéticos , Software , Estudos de Casos e Controles , Simulação por Computador , Humanos , Mutação
6.
PLoS Genet ; 17(2): e1009363, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33630843

RESUMO

Genome-wide association studies (GWASs) have identified multiple susceptibility loci for Alzheimer's disease (AD), which is characterized by early and progressive damage to the hippocampus. However, the association of hippocampal gene expression with AD and the underlying neurobiological pathways remain largely unknown. Based on the genomic and transcriptomic data of 111 hippocampal samples and the summary data of two large-scale meta-analyses of GWASs, a transcriptome-wide association study (TWAS) was performed to identify genes with significant associations between hippocampal expression and AD. We identified 54 significantly associated genes using an AD-GWAS meta-analysis of 455,258 individuals; 36 of the genes were confirmed in another AD-GWAS meta-analysis of 63,926 individuals. Fine-mapping models further prioritized 24 AD-related genes whose effects on AD were mediated by hippocampal expression, including APOE and two novel genes (PTPN9 and PCDHA4). These genes are functionally related to amyloid-beta formation, phosphorylation/dephosphorylation, neuronal apoptosis, neurogenesis and telomerase-related processes. By integrating the predicted hippocampal expression and neuroimaging data, we found that the hippocampal expression of QPCTL and ERCC2 showed significant difference between AD patients and cognitively normal elderly individuals as well as correlated with hippocampal volume. Mediation analysis further demonstrated that hippocampal volume mediated the effect of hippocampal gene expression (QPCTL and ERCC2) on AD. This study identifies two novel genes associated with AD by integrating hippocampal gene expression and genome-wide association data and reveals candidate hippocampus-mediated neurobiological pathways from gene expression to AD.


Assuntos
Doença de Alzheimer/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Hipocampo/metabolismo , Polimorfismo de Nucleotídeo Único , Transcriptoma/genética , Idoso , Idoso de 80 Anos ou mais , Doença de Alzheimer/diagnóstico por imagem , Feminino , Redes Reguladoras de Genes/genética , Genômica/métodos , Hipocampo/diagnóstico por imagem , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Sequenciamento Completo do Genoma/métodos
7.
Genome Res ; 30(12): 1789-1801, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33060171

RESUMO

The advances of large-scale genomics studies have enabled compilation of cell type-specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalability to process big genomic data, particularly when annotating whole-genome sequencing variants against a huge database with billions of genomic features. Here, we develop VarNote to rapidly annotate genome-scale variants in large and complex functional annotation resources. Equipped with a novel index system and a parallel random-sweep searching algorithm, VarNote shows substantial performance improvements (two to three orders of magnitude) over existing algorithms at different scales. It supports both region-based and allele-specific annotations and introduces advanced functions for the flexible extraction of annotations. By integrating massive base-wise and context-dependent annotations in the VarNote framework, we introduce three efficient and accurate pipelines to prioritize the causal regulatory variants for common diseases, Mendelian disorders, and cancers.


Assuntos
Biologia Computacional/métodos , Predisposição Genética para Doença/genética , Algoritmos , Bases de Dados Genéticas , Variação Genética , Genoma Humano , Humanos , Anotação de Sequência Molecular , Sequenciamento Completo do Genoma
8.
BMC Med ; 21(1): 179, 2023 05 11.
Artigo em Inglês | MEDLINE | ID: mdl-37170220

RESUMO

BACKGROUND: Oxidative stress (OS) is a key pathophysiological mechanism in Crohn's disease (CD). OS-related genes can be affected by environmental factors, intestinal inflammation, gut microbiota, and epigenetic changes. However, the role of OS as a potential CD etiological factor or triggering factor is unknown, as differentially expressed OS genes in CD can be either a cause or a subsequent change of intestinal inflammation. Herein, we used a multi-omics summary data-based Mendelian randomization (SMR) approach to identify putative causal effects and underlying mechanisms of OS genes in CD. METHODS: OS-related genes were extracted from the GeneCards database. Intestinal transcriptome datasets were collected from the Gene Expression Omnibus (GEO) database and meta-analyzed to identify differentially expressed genes (DEGs) related to OS in CD. Integration analyses of the largest CD genome-wide association study (GWAS) summaries with expression quantitative trait loci (eQTLs) and DNA methylation QTLs (mQTLs) from the blood were performed using SMR methods to prioritize putative blood OS genes and their regulatory elements associated with CD risk. Up-to-date intestinal eQTLs and fecal microbial QTLs (mbQTLs) were integrated to uncover potential interactions between host OS gene expression and gut microbiota through SMR and colocalization analysis. Two additional Mendelian randomization (MR) methods were used as sensitivity analyses. Putative results were validated in an independent multi-omics cohort from the First Affiliated Hospital of Sun Yat-sen University (FAH-SYS). RESULTS: A meta-analysis from six datasets identified 438 OS-related DEGs enriched in intestinal enterocytes in CD from 817 OS-related genes. Five genes from blood tissue were prioritized as candidate CD-causal genes using three-step SMR methods: BAD, SHC1, STAT3, MUC1, and GPX3. Furthermore, SMR analysis also identified five putative intestinal genes, three of which were involved in gene-microbiota interactions through colocalization analysis: MUC1, CD40, and PRKAB1. Validation results showed that 88.79% of DEGs were replicated in the FAH-SYS cohort. Associations between pairs of MUC1-Bacillus aciditolerans and PRKAB1-Escherichia coli in the FAH-SYS cohort were consistent with eQTL-mbQTL colocalization. CONCLUSIONS: This multi-omics integration study highlighted that OS genes causal to CD are regulated by DNA methylation and host-microbiota interactions. This provides evidence for future targeted functional research aimed at developing suitable therapeutic interventions and disease prevention.


Assuntos
Doença de Crohn , Microbioma Gastrointestinal , Humanos , Doença de Crohn/genética , Estudo de Associação Genômica Ampla , Metilação de DNA/genética , Microbioma Gastrointestinal/genética , Análise da Randomização Mendeliana/métodos , Multiômica , Transcriptoma , Inflamação , Estresse Oxidativo/genética
9.
Brief Bioinform ; 20(6): 2098-2115, 2019 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-30102366

RESUMO

The causes of a disease and its therapies are not only related to genotypes, but also associated with other factors, including phenotypes, environmental exposures, drugs and chemical molecules. Distinguishing disease-related factors from many neutral factors is critical as well as difficult. Over the past two decades, bioinformaticians have developed many computational resources to integrate the omics data and discover associations among these factors. However, researchers and clinicians are experiencing difficulties in choosing appropriate resources from hundreds of relevant databases and software tools. Here, in order to assist the researchers and clinicians, we systematically review the public computational resources of human diseases related to genotypes, phenotypes, environment factors, drugs and chemical exposures. We briefly describe the development history of these computational resources, followed by the details of the relevant databases and software tools. We finally conclude with a discussion of current challenges and future opportunities as well as prospects on this topic.


Assuntos
Biologia Computacional , Exposição Ambiental , Genótipo , Fenótipo , Humanos
10.
BMC Neurol ; 21(1): 96, 2021 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-33653295

RESUMO

BACKGROUND: Due to large genetic and phenotypic heterogeneity, the conventional workup for Charcot-Marie-Tooth (CMT) diagnosis is often underpowered, leading to diagnostic delay or even lack of diagnosis. In the present study, we explored how bioinformatics analysis on whole-exome sequencing (WES) data can be used to diagnose patients with CMT disease efficiently. CASE PRESENTATION: The proband is a 29-year-old female presented with a severe amyotrophy and distal skeletal deformity that plagued her family for over 20 years since she was 5-year-old. No other aberrant symptoms were detected in her speaking, hearing, vision, and intelligence. Similar symptoms manifested in her younger brother, while her parents and her older brother showed normal. To uncover the genetic causes of this disease, we performed exome sequencing for the proband and her parents. Subsequent bioinformatics analysis on the KGGSeq platform and further Sanger sequencing identified a novel homozygous GDAP1 nonsense mutation (c.218C > G, p.Ser73*) that responsible for the family. This genetic finding then led to a quick diagnosis of CMT type 4A (CMT4A), confirmed by nerve conduction velocity and electromyography examination of the patients. CONCLUSIONS: The patients with severe muscle atrophy and distal skeletal deformity were caused by a novel homozygous nonsense mutation in GDAP1 (c.218C > G, p.Ser73*), and were diagnosed as CMT4A finally. This study expanded the mutation spectrum of CMT disease and demonstrated how affordable WES could be effectively employed for the clinical diagnosis of unexplained phenotypes.


Assuntos
Doença de Charcot-Marie-Tooth/diagnóstico , Sequenciamento do Exoma/métodos , Proteínas do Tecido Nervoso/genética , Adulto , Povo Asiático , Doença de Charcot-Marie-Tooth/genética , Pré-Escolar , China , Códon sem Sentido , Diagnóstico Tardio , Feminino , Homozigoto , Humanos , Masculino , Atrofia Muscular/genética , Linhagem , Fenótipo , Irmãos
11.
Nucleic Acids Res ; 47(16): e96, 2019 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-31287869

RESUMO

Genomic identification of driver mutations and genes in cancer cells are critical for precision medicine. Due to difficulty in modelling distribution of background mutation counts, existing statistical methods are often underpowered to discriminate cancer-driver genes from passenger genes. Here we propose a novel statistical approach, weighted iterative zero-truncated negative-binomial regression (WITER, http://grass.cgs.hku.hk/limx/witer or KGGSeq,http://grass.cgs.hku.hk/limx/kggseq/), to detect cancer-driver genes showing an excess of somatic mutations. By fitting the distribution of background mutation counts properly, this approach works well even in small or moderate samples. Compared to alternative methods, it detected more significant and cancer-consensus genes in most tested cancers. Applying this approach, we estimated 229 driver genes in 26 different types of cancers. In silico validation confirmed 78% of predicted genes as likely known drivers and many other genes as very likely new drivers for corresponding cancers. The technical advances of WITER enable the detection of driver genes in TCGA datasets as small as 30 subjects and rescue of more genes missed by alternative tools in moderate or small samples.


Assuntos
Regulação Neoplásica da Expressão Gênica , Genômica/estatística & dados numéricos , Proteínas de Neoplasias/genética , Neoplasias/diagnóstico , Oncogenes , Software , Benchmarking , Simulação por Computador , Genômica/métodos , Humanos , Internet , Mutação , Proteínas de Neoplasias/classificação , Proteínas de Neoplasias/metabolismo , Neoplasias/classificação , Neoplasias/genética , Análise de Regressão , Tamanho da Amostra
12.
Hum Mol Genet ; 27(2): 351-358, 2018 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-29177441

RESUMO

The cloaca is an embryonic cavity that is divided into the urogenital sinus and rectum upon differentiation of the cloacal epithelium triggered by tissue-specific transcription factors including CDX2. Defective differentiation leads to persistent cloaca in humans (PC), a phenotype recapitulated in Cdx2 mutant mice. PC is linked to hypo/hyper-vitaminosis A. Although no gene has ever been identified, there is a strong evidence for a genetic contribution to PC. We applied whole-exome sequencing and copy-number-variants analyses to 21 PC patients and their unaffected parents. The damaging p.Cys132* and p.Arg237His de novo CDX2 variants were identified in two patients. These variants altered the expression of CYP26A1, a direct CDX2 target encoding the major retinoic acid (RA)-degrading enzyme. Other RA genes, including the RA-receptor alpha, were also mutated. Genes governing the development of cloaca-derived structures were recurrently mutated and over-represented in the basement-membrane components set (q-value < 1.65 × 10-6). Joint analysis of the patients' profile highlighted the extracellular matrix-receptor interaction pathway (MsigDBID: M7098, FDR: q-value < 7.16 × 10-9). This is the first evidence that PC is genetic, with genes involved in the RA metabolism at the lead. Given the CDX2 de novo variants and the role of RA, our observations could potentiate preventive measures. For the first time, a gene recapitulating PC in mouse models is found mutated in humans.


Assuntos
Fator de Transcrição CDX2/genética , Fator de Transcrição CDX2/metabolismo , Anormalidades Urogenitais/genética , Povo Asiático/genética , Diferenciação Celular/genética , Cloaca/embriologia , Variações do Número de Cópias de DNA , Família , Feminino , Proteínas de Homeodomínio/genética , Humanos , Masculino , Mutação , Fenótipo , Anormalidades Urogenitais/metabolismo , Sequenciamento do Exoma
13.
Bioinformatics ; 35(4): 628-635, 2019 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-30101339

RESUMO

MOTIVATION: It remains challenging to unravel new susceptibility genes of complex diseases and the mechanisms in genome-wide association studies. There are at least two difficulties, isolation of the genuine susceptibility genes from many indirectly associated genes and functional validation of these genes. RESULTS: We first proposed a novel conditional gene-based association test which can use only summary statistics to isolate independently associated genes of a disease. Applying this method, we detected 185 genes of independent association with schizophrenia. We then designed an in-silico experiment based on expression/co-expression to systematically validate pathogenic potential of these genes. We found that genes of independent association with schizophrenia formed more co-expression pairs in normal post-natal but not pre-natal human brain regions than expected. Interestingly, no co-expression enrichment was found in the brain regions of schizophrenia patients. The genes with independent association also had more significant P-values for differential expression between schizophrenia patients and controls in the brain regions. In contrast, indirectly associated genes or associated genes by other widely-used gene-based tests had no such differential expression and co-expression patterns. In summary, this conditional gene-based association test is effective for isolating directly associated genes from indirectly associated genes, and the results insightfully suggest that common variants might contribute to schizophrenia largely by distorting expression and co-expression in post-natal brains. AVAILABILITY AND IMPLEMENTATION: The conditional gene-based association test has been implemented in a platform 'KGG' in Java and is publicly available at http://grass.cgs.hku.hk/limx/kgg/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Esquizofrenia/genética , Humanos , Polimorfismo de Nucleotídeo Único
14.
BMC Cancer ; 20(1): 403, 2020 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-32393195

RESUMO

BACKGROUND: Recent genome-wide association studies (GWASs) have suggested several susceptibility loci of hepatitis B virus (HBV)-related hepatocellular carcinoma (HCC) by statistical analysis at individual single-nucleotide polymorphisms (SNPs). However, these loci only explain a small fraction of HBV-related HCC heritability. In the present study, we aimed to identify additional susceptibility loci of HBV-related HCC using advanced knowledge-based analysis. METHODS: We performed knowledge-based analysis (including gene- and gene-set-based association tests) on variant-level association p-values from two existing GWASs of HBV-related HCC. Five different types of gene-sets were collected for the association analysis. A number of SNPs within the gene prioritized by the knowledge-based association tests were selected to replicate genetic associations in an independent sample of 965 cases and 923 controls. RESULTS: The gene-based association analysis detected four genes significantly or suggestively associated with HBV-related HCC risk: SLC39A8, GOLGA8M, SMIM31, and WHAMMP2. The gene-set-based association analysis prioritized two promising gene sets for HCC, cell cycle G1/S transition and NOTCH1 intracellular domain regulates transcription. Within the gene sets, three promising candidate genes (CDC45, NCOR1 and KAT2A) were further prioritized for HCC. Among genes of liver-specific expression, multiple genes previously implicated in HCC were also highlighted. However, probably due to small sample size, none of the genes prioritized by the knowledge-based association analyses were successfully replicated by variant-level association test in the independent sample. CONCLUSIONS: This comprehensive knowledge-based association mining study suggested several promising genes and gene-sets associated with HBV-related HCC risks, which would facilitate follow-up functional studies on the pathogenic mechanism of HCC.


Assuntos
Biomarcadores Tumorais/genética , Carcinoma Hepatocelular/patologia , Predisposição Genética para Doença , Vírus da Hepatite B/isolamento & purificação , Hepatite B/complicações , Neoplasias Hepáticas/patologia , Polimorfismo de Nucleotídeo Único , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/virologia , Estudos de Casos e Controles , Feminino , Seguimentos , Estudo de Associação Genômica Ampla , Hepatite B/virologia , Humanos , Bases de Conhecimento , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/virologia , Masculino , Pessoa de Meia-Idade , Prognóstico
15.
Bioinformatics ; 34(18): 3145-3150, 2018 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-29718103

RESUMO

Motivation: Recently many studies showed single nucleotide polymorphisms (SNPs) affect gene expression and contribute to development of complex traits/diseases in a tissue context-dependent manner. However, little is known about haplotype's influence on gene expression and complex traits, which reflects the interaction effect between SNPs. Results: In the present study, we firstly proposed a regulatory region guided eQTL haplotype association analysis approach, and then systematically investigate the expression quantitative trait loci (eQTL) haplotypes in 20 different tissues by the approach. The approach has a powerful design of reducing computational burden by the utilization of regulatory predictions for candidate SNP selection and multiple testing corrections on non-independent haplotypes. The application results in multiple tissues showed that haplotype-based eQTLs not only increased the number of eQTL genes in a tissue specific manner, but were also enriched in loci that associated with complex traits in a tissue-matched manner. In addition, we found that tag SNPs of eQTL haplotypes from whole blood were selectively enriched in certain combination of regulatory elements (e.g. promoters and enhancers) according to predicted chromatin states. In summary, this eQTL haplotype detection approach, together with the application results, shed insights into synergistic effect of sequence variants on gene expression and their susceptibility to complex diseases. Availability and implementation: The executable application 'eHaplo' is implemented in Java and is publicly available at http://grass.cgs.hku.hk/limx/ehaplo/. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Haplótipos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla , Herança Multifatorial , Fenótipo
17.
Nucleic Acids Res ; 45(9): e75, 2017 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-28115622

RESUMO

Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation.


Assuntos
Algoritmos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Humanos
18.
Cell Mol Life Sci ; 74(9): 1721-1739, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-27990575

RESUMO

The development of the central nervous system (CNS) is a complex process that must be exquisitely controlled at multiple levels to ensure the production of appropriate types and quantity of neurons. RNA alternative polyadenylation (APA) contributes to transcriptome diversity and gene regulation, and has recently been shown to be widespread in the CNS. However, the previous studies have been primarily focused on the tissue specificity of APA and developmental APA change of whole model organisms; a systematic survey of APA usage is lacking during CNS development. Here, we conducted global analysis of APA during mouse retinal development, and identified stage-specific polyadenylation (pA) sites that are enriched for genes critical for retinal development and visual perception. Moreover, we demonstrated 3'UTR (untranslated region) lengthening and increased usage of intronic pA sites over development that would result in gaining many different RBP (RNA-binding protein) and miRNA target sites. Furthermore, we showed that a considerable number of polyadenylated lncRNAs are co-expressed with protein-coding genes involved in retinal development and functions. Together, our data indicate that APA is highly and dynamically regulated during retinal development and maturation, suggesting that APA may serve as a crucial mechanism of gene regulation underlying the delicate process of CNS development.


Assuntos
Poliadenilação , Retina/embriologia , Retina/metabolismo , Regiões 3' não Traduzidas/genética , Animais , Sequência de Bases , Regulação da Expressão Gênica no Desenvolvimento , Camundongos Endogâmicos C57BL , Motivos de Nucleotídeos/genética , Poliadenilação/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Fatores de Tempo
19.
Am J Med Genet B Neuropsychiatr Genet ; 177(1): 86-92, 2018 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-29150900

RESUMO

Epilepsy and schizophrenia are common and typical neurological or mental illness respectively, and sometimes they comorbid in the same patients, however the underlying genetic relationship between the two brain diseases is still not fully understood. To investigate the possible genetic contribution to their comorbidity, we performed polygenic risk score (PRS) analyses and genetic correlation estimation so as to identify the overall genetic overlap between the two diseases. The global schizophrenia PRS is strongly associated with schizophrenia phenotype in Hong Kong population (odds ratio = 1.7, p = 2.26E-16), and focal epilepsy PRS is moderately associated with epilepsy phenotype in Hong Kong population (odds ratio = 1.14, p = 0.013). However the disease-specific PRS can only predict its own well-matched phenotype but not the other ones (p > 0.05). This pattern is further supported by non-significant pairwise genetic correlation and insufficient statistical power for PRS association from the cross-phenotype analyses. Our study reveals there's limited shared genetic aetiology between schizophrenia and epilepsy, and thus supports a model of shared environmental factors to explain the comorbidity between the two phenotypes.


Assuntos
Epilepsia/genética , Esquizofrenia/genética , Adulto , Povo Asiático/genética , Comorbidade , Feminino , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Hong Kong/epidemiologia , Humanos , Masculino , Pessoa de Meia-Idade , Herança Multifatorial/genética , Razão de Chances , Fenótipo , Fatores de Risco
20.
Bioinformatics ; 32(20): 3065-3071, 2016 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-27354691

RESUMO

MOTIVATION: Exome sequencing studies have facilitated the detection of causal genetic variants in yet-unsolved Mendelian diseases. However, the identification of disease causal genes among a list of candidates in an exome sequencing study is still not fully settled, and it is often difficult to prioritize candidate genes for follow-up studies. The inheritance mode provides crucial information for understanding Mendelian diseases, but none of the existing gene prioritization tools fully utilize this information. RESULTS: We examined the characteristics of Mendelian disease genes under different inheritance modes. The results suggest that Mendelian disease genes with autosomal dominant (AD) inheritance mode are more haploinsufficiency and de novo mutation sensitive, whereas those autosomal recessive (AR) genes have significantly more non-synonymous variants and regulatory transcript isoforms. In addition, the X-linked (XL) Mendelian disease genes have fewer non-synonymous and synonymous variants. As a result, we derived a new scoring system for prioritizing candidate genes for Mendelian diseases according to the inheritance mode. Our scoring system assigned to each annotated protein-coding gene (N = 18 859) three pathogenic scores according to the inheritance mode (AD, AR and XL). This inheritance mode-specific framework achieved higher accuracy (area under curve = 0.84) in XL mode. CONCLUSION: The inheritance-mode specific pathogenicity prioritization (ISPP) outperformed other well-known methods including Haploinsufficiency, Recessive, Network centrality, Genic Intolerance, Gene Damage Index and Gene Constraint scores. This systematic study suggests that genes manifesting disease inheritance modes tend to have unique characteristics. AVAILABILITY AND IMPLEMENTATION: ISPP is included in KGGSeq v1.0 (http://grass.cgs.hku.hk/limx/kggseq/), and source code is available from (https://github.com/jacobhsu35/ISPP.git). CONTACT: mxli@hku.hkSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genes Dominantes , Genes Recessivos , Mutação , Proteínas/genética , Área Sob a Curva , Bases de Dados Genéticas , Variação Genética , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA