Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
2.
Nat Biotechnol ; 40(3): 355-363, 2022 03.
Article in English | MEDLINE | ID: mdl-34675423

ABSTRACT

As single-cell datasets grow in sample size, there is a critical need to characterize cell states that vary across samples and associate with sample attributes, such as clinical phenotypes. Current statistical approaches typically map cells to clusters and then assess differences in cluster abundance. Here we present co-varying neighborhood analysis (CNA), an unbiased method to identify associated cell populations with greater flexibility than cluster-based approaches. CNA characterizes dominant axes of variation across samples by identifying groups of small regions in transcriptional space-termed neighborhoods-that co-vary in abundance across samples, suggesting shared function or regulation. CNA performs statistical testing for associations between any sample-level attribute and the abundances of these co-varying neighborhood groups. Simulations show that CNA enables more sensitive and accurate identification of disease-associated cell states than a cluster-based approach. When applied to published datasets, CNA captures a Notch activation signature in rheumatoid arthritis, identifies monocyte populations expanded in sepsis and identifies a novel T cell population associated with progression to active tuberculosis.


Subject(s)
T-Lymphocytes , Transcriptome , Cluster Analysis , Phenotype , Transcriptome/genetics
3.
Bioinformatics ; 37(15): 2103-2111, 2021 Aug 09.
Article in English | MEDLINE | ID: mdl-33532840

ABSTRACT

MOTIVATION: Genome-wide association studies (GWASs) have identified thousands of common trait-associated genetic variants but interpretation of their function remains challenging. These genetic variants can overlap the binding sites of transcription factors (TFs) and therefore could alter gene expression. However, we currently lack a systematic understanding on how this mechanism contributes to phenotype. RESULTS: We present Motif-Raptor, a TF-centric computational tool that integrates sequence-based predictive models, chromatin accessibility, gene expression datasets and GWAS summary statistics to systematically investigate how TF function is affected by genetic variants. Given trait-associated non-coding variants, Motif-Raptor can recover relevant cell types and critical TFs to drive hypotheses regarding their mechanism of action. We tested Motif-Raptor on complex traits such as rheumatoid arthritis and red blood cell count and demonstrated its ability to prioritize relevant cell types, potential regulatory TFs and non-coding SNPs which have been previously characterized and validated. AVAILABILITY AND IMPLEMENTATION: Motif-Raptor is freely available as a Python package at: https://github.com/pinellolab/MotifRaptor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Nat Genet ; 52(12): 1355-1363, 2020 12.
Article in English | MEDLINE | ID: mdl-33199916

ABSTRACT

Fine-mapping aims to identify causal variants impacting complex traits. We propose PolyFun, a computationally scalable framework to improve fine-mapping accuracy by leveraging functional annotations across the entire genome-not just genome-wide-significant loci-to specify prior probabilities for fine-mapping methods such as SuSiE or FINEMAP. In simulations, PolyFun + SuSiE and PolyFun + FINEMAP were well calibrated and identified >20% more variants with a posterior causal probability >0.95 than identified in their nonfunctionally informed counterparts. In analyses of 49 UK Biobank traits (average n = 318,000), PolyFun + SuSiE identified 3,025 fine-mapped variant-trait pairs with posterior causal probability >0.95, a >32% improvement versus SuSiE. We used posterior mean per-SNP heritabilities from PolyFun + SuSiE to perform polygenic localization, constructing minimal sets of common SNPs causally explaining 50% of common SNP heritability; these sets ranged in size from 28 (hair color) to 3,400 (height) to 2 million (number of children). In conclusion, PolyFun prioritizes variants for functional follow-up and provides insights into complex trait architectures.


Subject(s)
Chromosome Mapping/methods , Computational Biology/methods , Genome-Wide Association Study/methods , Multifactorial Inheritance/genetics , Genome, Human/genetics , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics
5.
Hum Mol Genet ; 29(7): 1057-1067, 2020 05 08.
Article in English | MEDLINE | ID: mdl-31595288

ABSTRACT

Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10-14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.


Subject(s)
Chromatin/genetics , Genetic Diseases, Inborn/genetics , Molecular Sequence Annotation , Transcription Factors/genetics , Binding Sites/genetics , Computational Biology , Gene Expression Regulation/genetics , Genetic Diseases, Inborn/classification , Genetic Diseases, Inborn/pathology , Humans , Linkage Disequilibrium/genetics , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide/genetics , Protein Binding/genetics
6.
J Med Internet Res ; 21(9): e13766, 2019 09 12.
Article in English | MEDLINE | ID: mdl-31516124

ABSTRACT

BACKGROUND: The structure of the sexual networks and partnership characteristics of young black men who have sex with men (MSM) may be contributing to their high risk of contracting HIV in the United States. Assortative mixing, which refers to the tendency of individuals to have partners from one's own group, has been proposed as a potential explanation for disparities. OBJECTIVE: The objective of this study was to identify the age- and race-related search patterns of users of a diverse geosocial networking mobile app in seven metropolitan areas in the United States to understand the disparities in sexually transmitted infection and HIV risk in MSM communities. METHODS: Data were collected on user behavior between November 2015 and May 2016. Data pertaining to behavior on the app were collected for men who had searched for partners with at least one search parameter narrowed from defaults or used the app to send at least one private chat message and used the app at least once during the study period. Newman assortativity coefficient (R) was calculated from the study data to understand assortativity patterns of men by race. Pearson correlation coefficient was used to assess assortativity patterns by age. Heat maps were used to visualize the relationship between searcher's and candidate's characteristics by age band, race, or age band and race. RESULTS: From November 2015 through May 2016, there were 2,989,737 searches in all seven metropolitan areas among 122,417 searchers. Assortativity by age was important for looking at the profiles of candidates with correlation coefficients ranging from 0.284 (Birmingham) to 0.523 (San Francisco). Men tended to look at the profiles of candidates that matched their race in a highly assortative manner with R ranging from 0.310 (Birmingham) to 0.566 (Los Angeles). For the initiation of chats, race appeared to be slightly assortative for some groups with R ranging from 0.023 (Birmingham) to 0.305 (Los Angeles). Asian searchers were most assortative in initiating chats with Asian candidates in Boston, Los Angeles, New York, and San Francisco. In Birmingham and Tampa, searchers from all races tended to initiate chats with black candidates. CONCLUSIONS: Our results indicate that the age preferences of MSM are relatively consistent across cities, that is, younger MSM are more likely to be chatted with and have their profiles viewed compared with older MSM, but the patterns of racial mixing are more variable. Although some generalizations can be made regarding Web-based behaviors across all cities, city-specific usage patterns and trends should be analyzed to create targeted and localized interventions that may make the most difference in the lives of MSM in these areas.


Subject(s)
HIV Infections/prevention & control , Mobile Applications , Sexual Behavior , Sexual Partners , Sexually Transmitted Diseases/prevention & control , Social Networking , Adolescent , Adult , Black or African American , Cities , HIV Infections/transmission , Health Promotion , Homosexuality, Male , Humans , Male , Sexual and Gender Minorities , Sexually Transmitted Diseases/transmission , United States , Urban Population , Young Adult
7.
Genet Epidemiol ; 43(2): 180-188, 2019 Mar.
Article in English | MEDLINE | ID: mdl-30474154

ABSTRACT

Recent studies have examined the genetic correlations of single-nucleotide polymorphism (SNP) effect sizes across pairs of populations to better understand the genetic architectures of complex traits. These studies have estimated ρ g , the cross-population correlation of joint-fit effect sizes at genotyped SNPs. However, the value of ρ g depends both on the cross-population correlation of true causal effect sizes ( ρ b ) and on the similarity in linkage disequilibrium (LD) patterns in the two populations, which drive tagging effects. Here, we derive the value of the ratio ρ g / ρ b as a function of LD in each population. By applying existing methods to obtain estimates of ρ g , we can use this ratio to estimate ρ b . Our estimates of ρ b were equal to 0.55 ( SE = 0.14) between Europeans and East Asians averaged across nine traits in the Genetic Epidemiology Research on Adult Health and Aging data set, 0.54 ( SE = 0.18) between Europeans and South Asians averaged across 13 traits in the UK Biobank data set, and 0.48 ( SE = 0.06) and 0.65 ( SE = 0.09) between Europeans and East Asians in summary statistic data sets for type 2 diabetes and rheumatoid arthritis, respectively. These results implicate substantially different causal genetic architectures across continental populations.


Subject(s)
Genetics, Population , Adult , Aging/genetics , Arthritis, Rheumatoid/genetics , Biological Specimen Banks , Databases, Genetic , Diabetes Mellitus, Type 2/genetics , Genotype , Humans , Phenotype , Quantitative Trait, Heritable , United Kingdom
8.
Nat Genet ; 50(10): 1483-1493, 2018 10.
Article in English | MEDLINE | ID: mdl-30177862

ABSTRACT

Biological interpretation of genome-wide association study data frequently involves assessing whether SNPs linked to a biological process, for example, binding of a transcription factor, show unsigned enrichment for disease signal. However, signed annotations quantifying whether each SNP allele promotes or hinders the biological process can enable stronger statements about disease mechanism. We introduce a method, signed linkage disequilibrium profile regression, for detecting genome-wide directional effects of signed functional annotations on disease risk. We validate the method via simulations and application to molecular quantitative trait loci in blood, recovering known transcriptional regulators. We apply the method to expression quantitative trait loci in 48 Genotype-Tissue Expression tissues, identifying 651 transcription factor-tissue associations including 30 with robust evidence of tissue specificity. We apply the method to 46 diseases and complex traits (average n = 290 K), identifying 77 annotation-trait associations representing 12 independent transcription factor-trait associations, and characterize the underlying transcriptional programs using gene-set enrichment analyses. Our results implicate new causal disease genes and new disease mechanisms.


Subject(s)
Disease/genetics , Genome-Wide Association Study , Multifactorial Inheritance/genetics , Quantitative Trait Loci , Transcription Factors/metabolism , Binding Sites/genetics , Blood Cells/metabolism , Blood Cells/pathology , Blood Chemical Analysis , Gene Expression Regulation , Genetic Predisposition to Disease , Humans , Linkage Disequilibrium , Phenotype , Polymorphism, Single Nucleotide , Protein Binding , Risk Factors
9.
Nature ; 559(7714): 350-355, 2018 07.
Article in English | MEDLINE | ID: mdl-29995854

ABSTRACT

The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered in blood-derived DNA from 151,202 UK Biobank participants using phase-based computational techniques (estimated false discovery rate, 6-9%). We found six loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At three such loci (MPL, TM2D3-TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance (5-50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide range of effects on human health.


Subject(s)
Chromosome Aberrations , Clone Cells/cytology , Clone Cells/metabolism , Hematopoiesis/genetics , Mosaicism , Adult , Aged , Alleles , Biological Specimen Banks , Chromosome Breakage , Chromosome Fragile Sites/genetics , Chromosomes, Human, Pair 10/genetics , Female , Health , Hematologic Neoplasms/genetics , Hematologic Neoplasms/mortality , Humans , Male , Middle Aged , Penetrance , United Kingdom
10.
Nat Genet ; 50(7): 1041-1047, 2018 07.
Article in English | MEDLINE | ID: mdl-29942083

ABSTRACT

There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84× for eQTLs; P = 1.19 × 10-31) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80× for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06×; P = 1.20 × 10-35). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures.


Subject(s)
Disease/genetics , Multifactorial Inheritance , Quantitative Trait Loci , Genome-Wide Association Study/methods , Humans , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait, Heritable
11.
Nat Genet ; 50(4): 621-629, 2018 04.
Article in English | MEDLINE | ID: mdl-29632380

ABSTRACT

We introduce an approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics. Our approach uses stratified linkage disequilibrium (LD) score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We applied our approach to gene expression data from several sources together with GWAS summary statistics for 48 diseases and traits (average N = 169,331) and found significant tissue-specific enrichments (false discovery rate (FDR) < 5%) for 34 traits. In our analysis of multiple tissues, we detected a broad range of enrichments that recapitulated known biology. In our brain-specific analysis, significant enrichments included an enrichment of inhibitory over excitatory neurons for bipolar disorder, and excitatory over inhibitory neurons for schizophrenia and body mass index. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signals.


Subject(s)
Gene Expression , Genetic Predisposition to Disease , Bipolar Disorder/genetics , Body Mass Index , Brain/metabolism , Chromatin/genetics , Epigenesis, Genetic , Gene Expression Profiling/statistics & numerical data , Genome-Wide Association Study/statistics & numerical data , Humans , Immune System Diseases/genetics , Linkage Disequilibrium , Models, Genetic , Multifactorial Inheritance , Neurons/metabolism , Schizophrenia/genetics , Tissue Distribution/genetics
12.
Nat Genet ; 50(4): 538-548, 2018 04.
Article in English | MEDLINE | ID: mdl-29632383

ABSTRACT

Genome-wide association studies (GWAS) have identified over 100 risk loci for schizophrenia, but the causal mechanisms remain largely unknown. We performed a transcriptome-wide association study (TWAS) integrating a schizophrenia GWAS of 79,845 individuals from the Psychiatric Genomics Consortium with expression data from brain, blood, and adipose tissues across 3,693 primarily control individuals. We identified 157 TWAS-significant genes, of which 35 did not overlap a known GWAS locus. Of these 157 genes, 42 were associated with specific chromatin features measured in independent samples, thus highlighting potential regulatory targets for follow-up. Suppression of one identified susceptibility gene, mapk3, in zebrafish showed a significant effect on neurodevelopmental phenotypes. Expression and splicing from the brain captured most of the TWAS effect across all genes. This large-scale connection of associations to target genes, tissues, and regulatory features is an essential step in moving toward a mechanistic understanding of GWAS.


Subject(s)
Chromatin/genetics , Schizophrenia/etiology , Schizophrenia/genetics , Animals , Brain/metabolism , Gene Dosage , Gene Expression Profiling/methods , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Humans , Kinesins , Microtubule-Associated Proteins/genetics , Mitogen-Activated Protein Kinase 3/genetics , Multifactorial Inheritance , Protein Phosphatase 2/genetics , Quantitative Trait Loci , Zebrafish/genetics , Zebrafish/growth & development , Zebrafish Proteins/genetics
13.
Genome Res ; 28(5): 739-750, 2018 05.
Article in English | MEDLINE | ID: mdl-29588361

ABSTRACT

Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to predict cell-type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. By use of convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.


Subject(s)
Chromosomes/genetics , Computational Biology/methods , Neural Networks, Computer , Regulatory Sequences, Nucleic Acid/genetics , Animals , Epigenomics/methods , Gene Expression Profiling/methods , Gene Expression Regulation , Genomics/methods , Humans , Machine Learning , Models, Genetic , Polymorphism, Single Nucleotide , Promoter Regions, Genetic/genetics
14.
Nat Genet ; 48(11): 1443-1448, 2016 11.
Article in English | MEDLINE | ID: mdl-27694958

ABSTRACT

Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing in a genotyped cohort, an approach that can yield high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ∼20× speedup and ∼10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2× the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.


Subject(s)
Algorithms , Haplotypes , Cohort Studies , Female , Genotype , Humans , Male , Reference Values
15.
Nat Genet ; 47(11): 1228-35, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26414678

ABSTRACT

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.


Subject(s)
Disease/genetics , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Algorithms , Computer Simulation , Female , Gene Frequency , Histones/metabolism , Humans , Inheritance Patterns , Lysine/metabolism , Male , Methylation , Models, Genetic
17.
J Comput Biol ; 19(9): 998-1014, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22897201

ABSTRACT

Pedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the estimated pedigree to the true pedigree. In this article, we consider two main pedigree comparison problems. The first is the pedigree isomorphism problem, for which we present a linear-time algorithm for leaf-labeled pedigrees. The second is the pedigree edit distance problem, for which we present (1) several algorithms that are fast and exact in various special cases, and (2) a general, randomized heuristic algorithm. In the negative direction, we first prove that the pedigree isomorphism problem is as hard as the general graph isomorphism problem, and that the sub-pedigree isomorphism problem is NP-hard. We then show that the pedigree edit distance problem is APX-hard in general and NP-hard on leaf-labeled pedigrees. We use simulated pedigrees to compare our edit-distance algorithms to each other as well as to a branch-and-bound algorithm that always finds an optimal solution.


Subject(s)
Algorithms , Computer Simulation , Models, Genetic , Pedigree , Artificial Intelligence , Humans
18.
Science ; 334(6062): 1518-24, 2011 Dec 16.
Article in English | MEDLINE | ID: mdl-22174245

ABSTRACT

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.


Subject(s)
Data Interpretation, Statistical , Algorithms , Animals , Baseball/statistics & numerical data , Female , Gene Expression , Genes, Fungal , Genomics/methods , Humans , Intestines/microbiology , Male , Metagenome , Mice , Obesity , Saccharomyces cerevisiae/genetics
SELECTION OF CITATIONS
SEARCH DETAIL