Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Sci Rep ; 14(1): 10105, 2024 05 02.
Article in English | MEDLINE | ID: mdl-38698020

ABSTRACT

Colorectal cancer (CRC) is one of the top five most common and life-threatening malignancies worldwide. Most CRC develops from advanced colorectal adenoma (ACA), a precancerous stage, through the adenoma-carcinoma sequence. However, its underlying mechanisms, including how the tumor microenvironment changes, remain elusive. Therefore, we conducted an integrative analysis comparing RNA-seq data collected from 40 ACA patients who visited Dongguk University Ilsan Hospital with normal adjacent colons and tumor samples from 18 CRC patients collected from a public database. Differential expression analysis identified 21 and 79 sequentially up- or down-regulated genes across the continuum, respectively. The functional centrality of the continuum genes was assessed through network analysis, identifying 11 up- and 13 down-regulated hub-genes. Subsequently, we validated the prognostic effects of hub-genes using the Kaplan-Meier survival analysis. To estimate the immunological transition of the adenoma-carcinoma sequence, single-cell deconvolution and immune repertoire analyses were conducted. Significant composition changes for innate immunity cells and decreased plasma B-cells with immunoglobulin diversity were observed, along with distinctive immunoglobulin recombination patterns. Taken together, we believe our findings suggest underlying transcriptional and immunological changes during the adenoma-carcinoma sequence, contributing to the further development of pre-diagnostic markers for CRC.


Subject(s)
Adenoma , Colorectal Neoplasms , Computational Biology , Gene Expression Regulation, Neoplastic , Humans , Colorectal Neoplasms/genetics , Colorectal Neoplasms/immunology , Colorectal Neoplasms/pathology , Adenoma/genetics , Adenoma/immunology , Adenoma/pathology , Republic of Korea , Computational Biology/methods , Male , Female , Tumor Microenvironment/genetics , Tumor Microenvironment/immunology , Prognosis , Middle Aged , Aged , Biomarkers, Tumor/genetics , Kaplan-Meier Estimate , Gene Expression Profiling
2.
BMC Bioinformatics ; 25(1): 192, 2024 May 15.
Article in English | MEDLINE | ID: mdl-38750431

ABSTRACT

BACKGROUND: Researchers have long studied the regulatory processes of genes to uncover their functions. Gene regulatory network analysis is one of the popular approaches for understanding these processes, requiring accurate identification of interactions among the genes to establish the gene regulatory network. Advances in genome-wide association studies and expression quantitative trait loci studies have led to a wealth of genomic data, facilitating more accurate inference of gene-gene interactions. However, unknown confounding factors may influence these interactions, making their interpretation complicated. Mendelian randomization (MR) has emerged as a valuable tool for causal inference in genetics, addressing confounding effects by estimating causal relationships using instrumental variables. In this paper, we propose a new statistical method, MR-GGI, for accurately inferring gene-gene interactions using Mendelian randomization. RESULTS: MR-GGI applies one gene as the exposure and another as the outcome, using causal cis-single-nucleotide polymorphisms as instrumental variables in the inverse-variance weighted MR model. Through simulations, we have demonstrated MR-GGI's ability to control type 1 error and maintain statistical power despite confounding effects. MR-GGI performed the best when compared to other methods using the F1 score on the DREAM5 dataset. Additionally, when applied to yeast genomic data, MR-GGI successfully identified six clusters. Through gene ontology analysis, we have confirmed that each cluster in our study performs distinct functional roles by gathering genes with specific functions. CONCLUSION: These findings demonstrate that MR-GGI accurately inferences gene-gene interactions despite the confounding effects in real biological environments.


Subject(s)
Mendelian Randomization Analysis , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Gene Regulatory Networks/genetics , Epistasis, Genetic/genetics , Quantitative Trait Loci , Humans , Saccharomyces cerevisiae/genetics
3.
bioRxiv ; 2024 Jan 16.
Article in English | MEDLINE | ID: mdl-38293199

ABSTRACT

Accurate identification of human leukocyte antigen (HLA) alleles is essential for various clinical and research applications, such as transplant matching and drug sensitivities. Recent advances in RNA-seq technology have made it possible to impute HLA types from sequencing data, spurring the development of a large number of computational HLA typing tools. However, the relative performance of these tools is unknown, limiting the ability for clinical and biomedical research to make informed choices regarding which tools to use. Here we report the study design of a comprehensive benchmarking of the performance of 12 HLA callers across 682 RNA-seq samples from 8 datasets with molecularly defined gold standard at 5 loci, HLA-A, -B, -C, -DRB1, and -DQB1. For each HLA typing tool, we will comprehensively assess their accuracy, compare default with optimized parameters, and examine for discrepancies in accuracy at the allele and loci levels. We will also evaluate the computational expense of each HLA caller measured in terms of CPU time and RAM. We also plan to evaluate the influence of read length over the HLA region on accuracy for each tool. Most notably, we will examine the performance of HLA callers across European and African groups, to determine discrepancies in accuracy associated with ancestry. We hypothesize that RNA-Seq HLA callers are capable of returning high-quality results, but the tools that offer a good balance between accuracy and computational expensiveness for all ancestry groups are yet to be developed. We believe that our study will provide clinicians and researchers with clear guidance to inform their selection of an appropriate HLA caller.

4.
PLoS One ; 17(9): e0274879, 2022.
Article in English | MEDLINE | ID: mdl-36174000

ABSTRACT

Uterine fibroid is one of the most prevalent benign tumors in women, with high socioeconomic costs. Although genome-wide association studies (GWAS) have identified several loci associated with uterine fibroid risks, they could not successfully interpret the biological effects of genomic variants at the gene expression levels. To prioritize uterine fibroid susceptibility genes that are biologically interpretable, we conducted a transcriptome-wide association study (TWAS) by integrating GWAS data of uterine fibroid and expression quantitative loci data. We identified nine significant TWAS genes including two novel genes, RP11-282O18.3 and KBTBD7, which may be causal genes for uterine fibroid. We conducted functional enrichment network analyses using the TWAS results to investigate the biological pathways in which the overall TWAS genes were involved. The results demonstrated the immune system process to be a key pathway in uterine fibroid pathogenesis. Finally, we carried out chemical-gene interaction analyses using the TWAS results and the comparative toxicogenomics database to determine the potential risk chemicals for uterine fibroid. We identified five toxic chemicals that were significantly associated with uterine fibroid TWAS genes, suggesting that they may be implicated in the pathogenesis of uterine fibroid. In this study, we performed an integrative analysis covering the broad application of bioinformatics approaches. Our study may provide a deeper understanding of uterine fibroid etiologies and informative notifications about potential risk chemicals for uterine fibroid.


Subject(s)
Leiomyoma , Transcriptome , Female , Genetic Markers , Genome-Wide Association Study , Humans , Leiomyoma/genetics , Toxicogenetics
5.
IEEE J Biomed Health Inform ; 26(12): 6150-6160, 2022 12.
Article in English | MEDLINE | ID: mdl-36070258

ABSTRACT

Ion channels, which can be modulated by peptides, are promising drug targets for neurological, metabolic, and cardiovascular disorders. Because it is expensive and labor-intensive to experimentally screen ion channel-modulating peptides (IMPs), in-silico approaches can serve as excellent alternatives. In this study, we present PrIMP, prediction models for screening IMPs that can target sodium, potassium, and calcium ion channels, as well as nicotine acetylcholine receptors (nAChRs). To overcome the data insufficiency of the IMPs, we utilized two types of knowledge transfer approaches: multi-task learning (MTL) and transfer learning (TL). MTL enabled model training for four target tasks simultaneously with hard parameter sharing, thereby increasing model generalization. TL transferred knowledge of pre-trained model weights from antimicrobial peptide data, which was a much larger, naturally-occurring functional peptide dataset that could potentially improve the model performance. MTL and TL successfully improved the prediction performance of prediction models. In addition, a hybrid approach by implementing deep learning along with traditional machine learning was utilized, with additional performance improvements. PrIMP achieved F1 scores of 0.924 (sodium ion channel), 0.937 (potassium ion channel), 0.898 (calcium ion channel), and 0.931 (nAChRs). The pre-processed dataset and proposed model are available at https://github.com/bzlee-bio/PrIMP.


Subject(s)
Ion Channels , Machine Learning , Humans , Peptides
6.
Commun Biol ; 5(1): 615, 2022 06 22.
Article in English | MEDLINE | ID: mdl-35729261

ABSTRACT

Atopic dermatitis (AD) is one of the most common inflammatory skin diseases, which significantly impact the quality of life. Transcriptome-wide association study (TWAS) was conducted to estimate both transcriptomic and genomic features of AD and detected significant associations between 31 expression quantitative loci and 25 genes. Our results replicated well-known genetic markers for AD, as well as 4 novel associated genes. Next, transcriptome meta-analysis was conducted with 5 studies retrieved from public databases and identified 5 additional novel susceptibility genes for AD. Applying the connectivity map to the results from TWAS and meta-analysis, robustly enriched perturbations were identified and their chemical or functional properties were analyzed. Here, we report the first research on integrative approaches for an AD, combining TWAS and transcriptome meta-analysis. Together, our findings could provide a comprehensive understanding of the pathophysiologic mechanisms of AD and suggest potential drug candidates as alternative treatment options.


Subject(s)
Dermatitis, Atopic , Transcriptome , Dermatitis, Atopic/drug therapy , Dermatitis, Atopic/genetics , Dermatitis, Atopic/metabolism , Drug Repositioning , Genome-Wide Association Study/methods , Humans , Quality of Life
7.
Int J Mol Sci ; 22(6)2021 Mar 22.
Article in English | MEDLINE | ID: mdl-33809961

ABSTRACT

Amyotrophic lateral sclerosis (ALS) is a neurodegenerative neuromuscular disease. Although genome-wide association studies (GWAS) have successfully identified many variants significantly associated with ALS, it is still difficult to characterize the underlying biological mechanisms inducing ALS. In this study, we performed a transcriptome-wide association study (TWAS) to identify disease-specific genes in ALS. Using the largest ALS GWAS summary statistic (n = 80,610), we identified seven novel genes using 19 tissue reference panels. We conducted a conditional analysis to verify the genes' independence and to confirm that they are driven by genetically regulated expressions. Furthermore, we performed a TWAS-based enrichment analysis to highlight the association of important biological pathways, one in each of the four tissue reference panels. Finally, utilizing a connectivity map, a database of human cell expression profiles cultured with bioactive small molecules, we discovered functional associations between genes and drugs to identify 15 bioactive small molecules as potential drug candidates for ALS. We believe that, by integrating the largest ALS GWAS summary statistic with gene expression to identify new risk loci and causal genes, our study provides strong candidates for molecular basis experiments in ALS.


Subject(s)
Amyotrophic Lateral Sclerosis/genetics , Genetic Markers , Genetic Predisposition to Disease , Transcriptome , Amyotrophic Lateral Sclerosis/diagnosis , Amyotrophic Lateral Sclerosis/drug therapy , Biomarkers , Computational Biology/methods , Drug Development , Drug Repositioning , Gene Expression Profiling , Humans , Molecular Sequence Annotation , Molecular Targeted Therapy , Risk Assessment , Risk Factors , Workflow
8.
BMC Genomics ; 21(Suppl 10): 616, 2020 Nov 18.
Article in English | MEDLINE | ID: mdl-33208108

ABSTRACT

BACKGROUND: Regulatory hotspots are genetic variations that may regulate the expression levels of many genes. It has been of great interest to find those hotspots utilizing expression quantitative trait locus (eQTL) analysis. However, it has been reported that many of the findings are spurious hotspots induced by various unknown confounding factors. Recently, methods utilizing complicated statistical models have been developed that successfully identify genuine hotspots. Next-generation Intersample Correlation Emended (NICE) is one of the methods that show high sensitivity and low false-discovery rate in finding regulatory hotspots. Even though the methods successfully find genuine hotspots, they have not been widely used due to their non-user-friendly interfaces and complex running processes. Furthermore, most of the methods are impractical due to their prohibitively high computational complexity. RESULTS: To overcome the limitations of existing methods, we developed a fully automated web-based tool, referred to as NICER (NICE Renew), which is based on NICE program. First, we dramatically reduced running and installing burden of NICE. Second, we significantly reduced running time by incorporating multi-processing. Third, besides our web-based NICER, users can use NICER on Google Compute Engine and can readily install and run the NICER web service on their local computers. Finally, we provide different input formats and visualizations tools to show results. Utilizing a yeast dataset, we show that NICER can be successfully used in an eQTL analysis to identify many genuine regulatory hotspots, for which more than half of the hotspots were previously reported elsewhere. CONCLUSIONS: Even though many hotspot analysis tools have been proposed, they have not been widely used for many practical reasons. NICER is a fully-automated web-based solution for eQTL mapping and regulatory hotspots analysis. NICER provides a user-friendly interface and has made hotspot analysis more viable by reducing the running time significantly. We believe that NICER will become the method of choice for increasing power of eQTL hotspot analysis.


Subject(s)
Quantitative Trait Loci , Saccharomyces cerevisiae , Chromosome Mapping , Internet , Models, Statistical , Saccharomyces cerevisiae/genetics
9.
Genes (Basel) ; 10(11)2019 10 30.
Article in English | MEDLINE | ID: mdl-31671645

ABSTRACT

Polymyositis (PM) and dermatomyositis (DM) are both classified as idiopathic inflammatory myopathies. They share a few common characteristics such as inflammation and muscle weakness. Previous studies have indicated that these diseases present aspects of an auto-immune disorder; however, their exact pathogenesis is still unclear. In this study, three gene expression datasets (PM: 7, DM: 50, Control: 13) available in public databases were used to conduct meta-analysis. We then conducted expression quantitative trait loci analysis to detect the variant sites that may contribute to the pathogenesis of PM and DM. Six-hundred differentially expressed genes were identified in the meta-analysis (false discovery rate (FDR) < 0.01), among which 317 genes were up-regulated and 283 were down-regulated in the disease group compared with those in the healthy control group. The up-regulated genes were significantly enriched in interferon-signaling pathways in protein secretion, and/or in unfolded-protein response. We detected 10 single nucleotide polymorphisms (SNPs) which could potentially play key roles in driving the PM and DM. Along with previously reported genes, we identified 4 novel genes and 10 SNP-variant regions which could be used as candidates for potential drug targets or biomarkers for PM and DM.


Subject(s)
Dermatomyositis/genetics , Polymyositis/genetics , Biomarkers , Case-Control Studies , Databases, Genetic , Gene Expression/genetics , Gene Expression Profiling/methods , Genetic Markers/genetics , Genetic Predisposition to Disease/genetics , Humans , Interferons/genetics , Myositis/genetics , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Unfolded Protein Response/genetics
10.
BMC Med Genomics ; 12(Suppl 5): 98, 2019 07 11.
Article in English | MEDLINE | ID: mdl-31296227

ABSTRACT

BACKGROUND: Dupuytren's disease (DD) is a fibroproliferative disorder characterized by thickening and contracting palmar fascia. The exact pathogenesis of DD remains unknown. RESULTS: In this study, we identified co-expressed gene set (DD signature) consisting of 753 genes via weighted gene co-expression network analysis. To confirm the robustness of DD signature, module enrichment analysis and meta-analysis were performed. Moreover, this signature effectively classified DD disease samples. The DD signature were significantly enriched in unfolded protein response (UPR) related to endoplasmic reticulum (ER) stress. Next, we conducted multiple-phenotype regression analysis to identify trans-regulatory hotspots regulating expression levels of DD signature using Genotype-Tissue Expression data. Finally, 10 trans-regulatory hotspots and 16 eGenes genes that are significantly associated with at least one cis-eQTL were identified. CONCLUSIONS: Among these eGenes, major histocompatibility complex class II genes and ZFP57 zinc finger protein were closely related to ER stress and UPR, suggesting that these genetic markers might be potential therapeutic targets for DD.


Subject(s)
Dupuytren Contracture/genetics , Gene Expression Profiling , Genetic Markers/genetics , Genomics , Animals , Gene Regulatory Networks , Humans
11.
Sci Rep ; 9(1): 3176, 2019 02 28.
Article in English | MEDLINE | ID: mdl-30816214

ABSTRACT

Characterization of protein structural changes in response to protein modifications, ligand or chemical binding, or protein-protein interactions is essential for understanding protein function and its regulation. Amide hydrogen/deuterium exchange (HDX) coupled with mass spectrometry (MS) is one of the most favorable tools for characterizing the protein dynamics and changes of protein conformation. However, currently the analysis of HDX-MS data is not up to its full power as it still requires manual validation by mass spectrometry experts. Especially, with the advent of high throughput technologies, the data size grows everyday and an automated tool is essential for the analysis. Here, we introduce a fully automated software, referred to as 'deMix', for the HDX-MS data analysis. deMix deals directly with the deuterated isotopic distributions, but not considering their centroid masses and is designed to be robust over random noises. In addition, unlike the existing approaches that can only determine a single state from an isotopic distribution, deMix can also detect a bimodal deuterated distribution, arising from EX1 behavior or heterogeneous peptides in conformational isomer proteins. Furthermore, deMix comes with visualization software to facilitate validation and representation of the analysis results.


Subject(s)
Hydrogen Deuterium Exchange-Mass Spectrometry/methods , Proteins/ultrastructure , Software , Protein Conformation , Proteins/chemistry
12.
J Comput Biol ; 26(11): 1203-1213, 2019 11.
Article in English | MEDLINE | ID: mdl-30272994

ABSTRACT

Genotype imputation has been widely utilized for two reasons in the analysis of genome-wide association studies (GWAS). One reason is to increase the power for association studies when causal single nucleotide polymorphisms are not collected in the GWAS. The second reason is to aid the interpretation of a GWAS result by predicting the association statistics at untyped variants. In this article, we show that prediction of association statistics at untyped variants that have an influence on the trait produces is overly conservative. Current imputation methods assume that none of the variants in a region (locus consists of multiple variants) affect the trait, which is often inconsistent with the observed data. In this article, we propose a new method, CAUSAL-Imp, which can impute the association statistics at untyped variants while taking into account variants in the region that may affect the trait. Our method builds on recent methods that impute the marginal statistics for GWAS by utilizing the fact that marginal statistics follow a multivariate normal distribution. We utilize both simulated and real data sets to assess the performance of our method. We show that traditional imputation approaches underestimate the association statistics for variants involved in the trait, and our results demonstrate that our approach provides less biased estimates of these association statistics.


Subject(s)
Genome-Wide Association Study/statistics & numerical data , Genome/genetics , Software , Genotype , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics
13.
Genetics ; 209(3): 685-698, 2018 07.
Article in English | MEDLINE | ID: mdl-29752291

ABSTRACT

Over the past few years, genome-wide association studies have identified many trait-associated loci that have different effects on females and males, which increased attention to the genetic architecture differences between the sexes. The between-sex differences in genetic architectures can cause a variety of phenomena such as differences in the effect sizes at trait-associated loci, differences in the magnitudes of polygenic background effects, and differences in the phenotypic variances. However, current association testing approaches for dealing with sex, such as including sex as a covariate, cannot fully account for these phenomena and can be suboptimal in statistical power. We present a novel association mapping framework, MetaSex, that can comprehensively account for the genetic architecture differences between the sexes. Through simulations and applications to real data, we show that our framework has superior performance than previous approaches in association mapping.


Subject(s)
Chromosome Mapping/methods , Computational Biology/methods , Genome-Wide Association Study/methods , Sex Characteristics , Algorithms , Female , Humans , Male , Multifactorial Inheritance , Quantitative Trait Loci
14.
Am J Hum Genet ; 100(5): 789-802, 2017 May 04.
Article in English | MEDLINE | ID: mdl-28475861

ABSTRACT

Recent successes in genome-wide association studies (GWASs) make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH) arising from multiple causal variants at a locus. We developed a computational method to infer the probability of AH and applied it to three GWASs and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4,152 loci with strong evidence of AH. The proportion of all loci with identified AH is 4%-23% in eQTLs, 35% in GWASs of high-density lipoprotein (HDL), and 23% in GWASs of schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH (R2 = 0.85, p = 2.2 × 10-16), indicating that statistical power prevents identification of AH in other loci. Understanding the extent of AH may guide the development of new methods for fine mapping and association mapping of complex traits.


Subject(s)
Alleles , Gene Frequency , Quantitative Trait Loci , Databases, Genetic , Genetic Association Studies , Humans , Linkage Disequilibrium , Models, Molecular , Phenotype
15.
Am J Hum Genet ; 99(6): 1245-1260, 2016 Dec 01.
Article in English | MEDLINE | ID: mdl-27866706

ABSTRACT

The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci.


Subject(s)
Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Models, Genetic , Models, Statistical , Quantitative Trait Loci/genetics , Datasets as Topic , Gene Expression Regulation/genetics , Genotype , Glucose/metabolism , Humans , Insulin/metabolism , Linkage Disequilibrium , Organ Specificity , Probability , Sample Size
16.
Genetics ; 204(4): 1379-1390, 2016 Dec.
Article in English | MEDLINE | ID: mdl-27770036

ABSTRACT

A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms.


Subject(s)
Algorithms , Genome-Wide Association Study/methods , Phenotype , Animals , Humans , Mice , Polymorphism, Single Nucleotide , Population/genetics , Sensitivity and Specificity , Yeasts/genetics
17.
Genome Biol ; 17: 62, 2016 Apr 01.
Article in English | MEDLINE | ID: mdl-27039378

ABSTRACT

BACKGROUND: Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM. RESULTS: We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. CONCLUSIONS: We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data.


Subject(s)
Genome-Wide Association Study/standards , Yeasts/genetics , Animals , Computational Biology/methods , Databases, Genetic , Genome-Wide Association Study/methods , Humans , Linear Models , Mice , Software
18.
Genome Res ; 25(10): 1558-69, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26260972

ABSTRACT

Genetics provides a potentially powerful approach to dissect host-gut microbiota interactions. Toward this end, we profiled gut microbiota using 16s rRNA gene sequencing in a panel of 110 diverse inbred strains of mice. This panel has previously been studied for a wide range of metabolic traits and can be used for high-resolution association mapping. Using a SNP-based approach with a linear mixed model, we estimated the heritability of microbiota composition. We conclude that, in a controlled environment, the genetic background accounts for a substantial fraction of abundance of most common microbiota. The mice were previously studied for response to a high-fat, high-sucrose diet, and we hypothesized that the dietary response was determined in part by gut microbiota composition. We tested this using a cross-fostering strategy in which a strain showing a modest response, SWR, was seeded with microbiota from a strain showing a strong response, A×B19. Consistent with a role of microbiota in dietary response, the cross-fostered SWR pups exhibited a significantly increased response in weight gain. To examine specific microbiota contributing to the response, we identified various genera whose abundance correlated with dietary response. Among these, we chose Akkermansia muciniphila, a common anaerobe previously associated with metabolic effects. When administered to strain A×B19 by gavage, the dietary response was significantly blunted for obesity, plasma lipids, and insulin resistance. In an effort to further understand host-microbiota interactions, we mapped loci controlling microbiota composition and prioritized candidate genes. Our publicly available data provide a resource for future studies.


Subject(s)
Gastrointestinal Microbiome/genetics , Animals , Diet , Diet, High-Fat , Environment , Female , Genome-Wide Association Study , Heredity , Male , Mice , Mice, Inbred Strains , Obesity/microbiology , RNA, Ribosomal, 16S , Sucrose/metabolism
19.
Bioinformatics ; 30(12): i204-11, 2014 Jun 15.
Article in English | MEDLINE | ID: mdl-24931985

ABSTRACT

MOTIVATION: High-throughput sequencing technologies have impacted many areas of genetic research. One such area is the identification of relatives from genetic data. The standard approach for the identification of genetic relatives collects the genomic data of all individuals and stores it in a database. Then, each pair of individuals is compared to detect the set of genetic relatives, and the matched individuals are informed. The main drawback of this approach is the requirement of sharing your genetic data with a trusted third party to perform the relatedness test. RESULTS: In this work, we propose a secure protocol to detect the genetic relatives from sequencing data while not exposing any information about their genomes. We assume that individuals have access to their genome sequences but do not want to share their genomes with anyone else. Unlike previous approaches, our approach uses both common and rare variants which provide the ability to detect much more distant relationships securely. We use a simulated data generated from the 1000 genomes data and illustrate that we can easily detect up to fifth degree cousins which was not possible using the existing methods. We also show in the 1000 genomes data with cryptic relationships that our method can detect these individuals. AVAILABILITY: The software is freely available for download at http://genetics.cs.ucla.edu/crypto/.


Subject(s)
Genetic Privacy , Genetic Variation , Genome, Human , Genomics/methods , Pedigree , Haplotypes , High-Throughput Nucleotide Sequencing , Humans
20.
Genome Biol ; 15(4): r61, 2014 Apr 07.
Article in English | MEDLINE | ID: mdl-24708878

ABSTRACT

Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Polymorphism, Single Nucleotide , Regulatory Sequences, Nucleic Acid/genetics , Quantitative Trait Loci , Sensitivity and Specificity , Yeasts/genetics
SELECTION OF CITATIONS
SEARCH DETAIL