Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
2.
Nat Biotechnol ; 40(3): 355-363, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34675423

RESUMO

As single-cell datasets grow in sample size, there is a critical need to characterize cell states that vary across samples and associate with sample attributes, such as clinical phenotypes. Current statistical approaches typically map cells to clusters and then assess differences in cluster abundance. Here we present co-varying neighborhood analysis (CNA), an unbiased method to identify associated cell populations with greater flexibility than cluster-based approaches. CNA characterizes dominant axes of variation across samples by identifying groups of small regions in transcriptional space-termed neighborhoods-that co-vary in abundance across samples, suggesting shared function or regulation. CNA performs statistical testing for associations between any sample-level attribute and the abundances of these co-varying neighborhood groups. Simulations show that CNA enables more sensitive and accurate identification of disease-associated cell states than a cluster-based approach. When applied to published datasets, CNA captures a Notch activation signature in rheumatoid arthritis, identifies monocyte populations expanded in sepsis and identifies a novel T cell population associated with progression to active tuberculosis.


Assuntos
Linfócitos T , Transcriptoma , Análise por Conglomerados , Fenótipo , Transcriptoma/genética
3.
Bioinformatics ; 37(15): 2103-2111, 2021 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-33532840

RESUMO

MOTIVATION: Genome-wide association studies (GWASs) have identified thousands of common trait-associated genetic variants but interpretation of their function remains challenging. These genetic variants can overlap the binding sites of transcription factors (TFs) and therefore could alter gene expression. However, we currently lack a systematic understanding on how this mechanism contributes to phenotype. RESULTS: We present Motif-Raptor, a TF-centric computational tool that integrates sequence-based predictive models, chromatin accessibility, gene expression datasets and GWAS summary statistics to systematically investigate how TF function is affected by genetic variants. Given trait-associated non-coding variants, Motif-Raptor can recover relevant cell types and critical TFs to drive hypotheses regarding their mechanism of action. We tested Motif-Raptor on complex traits such as rheumatoid arthritis and red blood cell count and demonstrated its ability to prioritize relevant cell types, potential regulatory TFs and non-coding SNPs which have been previously characterized and validated. AVAILABILITY AND IMPLEMENTATION: Motif-Raptor is freely available as a Python package at: https://github.com/pinellolab/MotifRaptor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Nat Genet ; 52(12): 1355-1363, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33199916

RESUMO

Fine-mapping aims to identify causal variants impacting complex traits. We propose PolyFun, a computationally scalable framework to improve fine-mapping accuracy by leveraging functional annotations across the entire genome-not just genome-wide-significant loci-to specify prior probabilities for fine-mapping methods such as SuSiE or FINEMAP. In simulations, PolyFun + SuSiE and PolyFun + FINEMAP were well calibrated and identified >20% more variants with a posterior causal probability >0.95 than identified in their nonfunctionally informed counterparts. In analyses of 49 UK Biobank traits (average n = 318,000), PolyFun + SuSiE identified 3,025 fine-mapped variant-trait pairs with posterior causal probability >0.95, a >32% improvement versus SuSiE. We used posterior mean per-SNP heritabilities from PolyFun + SuSiE to perform polygenic localization, constructing minimal sets of common SNPs causally explaining 50% of common SNP heritability; these sets ranged in size from 28 (hair color) to 3,400 (height) to 2 million (number of children). In conclusion, PolyFun prioritizes variants for functional follow-up and provides insights into complex trait architectures.


Assuntos
Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Herança Multifatorial/genética , Genoma Humano/genética , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética
5.
Hum Mol Genet ; 29(7): 1057-1067, 2020 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-31595288

RESUMO

Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10-14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.


Assuntos
Cromatina/genética , Doenças Genéticas Inatas/genética , Anotação de Sequência Molecular , Fatores de Transcrição/genética , Sítios de Ligação/genética , Biologia Computacional , Regulação da Expressão Gênica/genética , Doenças Genéticas Inatas/classificação , Doenças Genéticas Inatas/patologia , Humanos , Desequilíbrio de Ligação/genética , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Ligação Proteica/genética
6.
J Med Internet Res ; 21(9): e13766, 2019 09 12.
Artigo em Inglês | MEDLINE | ID: mdl-31516124

RESUMO

BACKGROUND: The structure of the sexual networks and partnership characteristics of young black men who have sex with men (MSM) may be contributing to their high risk of contracting HIV in the United States. Assortative mixing, which refers to the tendency of individuals to have partners from one's own group, has been proposed as a potential explanation for disparities. OBJECTIVE: The objective of this study was to identify the age- and race-related search patterns of users of a diverse geosocial networking mobile app in seven metropolitan areas in the United States to understand the disparities in sexually transmitted infection and HIV risk in MSM communities. METHODS: Data were collected on user behavior between November 2015 and May 2016. Data pertaining to behavior on the app were collected for men who had searched for partners with at least one search parameter narrowed from defaults or used the app to send at least one private chat message and used the app at least once during the study period. Newman assortativity coefficient (R) was calculated from the study data to understand assortativity patterns of men by race. Pearson correlation coefficient was used to assess assortativity patterns by age. Heat maps were used to visualize the relationship between searcher's and candidate's characteristics by age band, race, or age band and race. RESULTS: From November 2015 through May 2016, there were 2,989,737 searches in all seven metropolitan areas among 122,417 searchers. Assortativity by age was important for looking at the profiles of candidates with correlation coefficients ranging from 0.284 (Birmingham) to 0.523 (San Francisco). Men tended to look at the profiles of candidates that matched their race in a highly assortative manner with R ranging from 0.310 (Birmingham) to 0.566 (Los Angeles). For the initiation of chats, race appeared to be slightly assortative for some groups with R ranging from 0.023 (Birmingham) to 0.305 (Los Angeles). Asian searchers were most assortative in initiating chats with Asian candidates in Boston, Los Angeles, New York, and San Francisco. In Birmingham and Tampa, searchers from all races tended to initiate chats with black candidates. CONCLUSIONS: Our results indicate that the age preferences of MSM are relatively consistent across cities, that is, younger MSM are more likely to be chatted with and have their profiles viewed compared with older MSM, but the patterns of racial mixing are more variable. Although some generalizations can be made regarding Web-based behaviors across all cities, city-specific usage patterns and trends should be analyzed to create targeted and localized interventions that may make the most difference in the lives of MSM in these areas.


Assuntos
Infecções por HIV/prevenção & controle , Aplicativos Móveis , Comportamento Sexual , Parceiros Sexuais , Infecções Sexualmente Transmissíveis/prevenção & controle , Rede Social , Adolescente , Adulto , Negro ou Afro-Americano , Cidades , Infecções por HIV/transmissão , Promoção da Saúde , Homossexualidade Masculina , Humanos , Masculino , Minorias Sexuais e de Gênero , Infecções Sexualmente Transmissíveis/transmissão , Estados Unidos , População Urbana , Adulto Jovem
7.
Genet Epidemiol ; 43(2): 180-188, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30474154

RESUMO

Recent studies have examined the genetic correlations of single-nucleotide polymorphism (SNP) effect sizes across pairs of populations to better understand the genetic architectures of complex traits. These studies have estimated ρ g , the cross-population correlation of joint-fit effect sizes at genotyped SNPs. However, the value of ρ g depends both on the cross-population correlation of true causal effect sizes ( ρ b ) and on the similarity in linkage disequilibrium (LD) patterns in the two populations, which drive tagging effects. Here, we derive the value of the ratio ρ g / ρ b as a function of LD in each population. By applying existing methods to obtain estimates of ρ g , we can use this ratio to estimate ρ b . Our estimates of ρ b were equal to 0.55 ( SE = 0.14) between Europeans and East Asians averaged across nine traits in the Genetic Epidemiology Research on Adult Health and Aging data set, 0.54 ( SE = 0.18) between Europeans and South Asians averaged across 13 traits in the UK Biobank data set, and 0.48 ( SE = 0.06) and 0.65 ( SE = 0.09) between Europeans and East Asians in summary statistic data sets for type 2 diabetes and rheumatoid arthritis, respectively. These results implicate substantially different causal genetic architectures across continental populations.


Assuntos
Genética Populacional , Adulto , Envelhecimento/genética , Artrite Reumatoide/genética , Bancos de Espécimes Biológicos , Bases de Dados Genéticas , Diabetes Mellitus Tipo 2/genética , Genótipo , Humanos , Fenótipo , Característica Quantitativa Herdável , Reino Unido
8.
Nat Genet ; 50(10): 1483-1493, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30177862

RESUMO

Biological interpretation of genome-wide association study data frequently involves assessing whether SNPs linked to a biological process, for example, binding of a transcription factor, show unsigned enrichment for disease signal. However, signed annotations quantifying whether each SNP allele promotes or hinders the biological process can enable stronger statements about disease mechanism. We introduce a method, signed linkage disequilibrium profile regression, for detecting genome-wide directional effects of signed functional annotations on disease risk. We validate the method via simulations and application to molecular quantitative trait loci in blood, recovering known transcriptional regulators. We apply the method to expression quantitative trait loci in 48 Genotype-Tissue Expression tissues, identifying 651 transcription factor-tissue associations including 30 with robust evidence of tissue specificity. We apply the method to 46 diseases and complex traits (average n = 290 K), identifying 77 annotation-trait associations representing 12 independent transcription factor-trait associations, and characterize the underlying transcriptional programs using gene-set enrichment analyses. Our results implicate new causal disease genes and new disease mechanisms.


Assuntos
Doença/genética , Estudo de Associação Genômica Ampla , Herança Multifatorial/genética , Locos de Características Quantitativas , Fatores de Transcrição/metabolismo , Sítios de Ligação/genética , Células Sanguíneas/metabolismo , Células Sanguíneas/patologia , Análise Química do Sangue , Regulação da Expressão Gênica , Predisposição Genética para Doença , Humanos , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo Único , Ligação Proteica , Fatores de Risco
9.
Nature ; 559(7714): 350-355, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29995854

RESUMO

The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered in blood-derived DNA from 151,202 UK Biobank participants using phase-based computational techniques (estimated false discovery rate, 6-9%). We found six loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At three such loci (MPL, TM2D3-TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance (5-50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide range of effects on human health.


Assuntos
Aberrações Cromossômicas , Células Clonais/citologia , Células Clonais/metabolismo , Hematopoese/genética , Mosaicismo , Adulto , Idoso , Alelos , Bancos de Espécimes Biológicos , Quebra Cromossômica , Sítios Frágeis do Cromossomo/genética , Cromossomos Humanos Par 10/genética , Feminino , Saúde , Neoplasias Hematológicas/genética , Neoplasias Hematológicas/mortalidade , Humanos , Masculino , Pessoa de Meia-Idade , Penetrância , Reino Unido
10.
Nat Genet ; 50(7): 1041-1047, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29942083

RESUMO

There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84× for eQTLs; P = 1.19 × 10-31) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80× for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06×; P = 1.20 × 10-35). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures.


Assuntos
Doença/genética , Herança Multifatorial , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla/métodos , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável
11.
Nat Genet ; 50(4): 621-629, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29632380

RESUMO

We introduce an approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics. Our approach uses stratified linkage disequilibrium (LD) score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We applied our approach to gene expression data from several sources together with GWAS summary statistics for 48 diseases and traits (average N = 169,331) and found significant tissue-specific enrichments (false discovery rate (FDR) < 5%) for 34 traits. In our analysis of multiple tissues, we detected a broad range of enrichments that recapitulated known biology. In our brain-specific analysis, significant enrichments included an enrichment of inhibitory over excitatory neurons for bipolar disorder, and excitatory over inhibitory neurons for schizophrenia and body mass index. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signals.


Assuntos
Expressão Gênica , Predisposição Genética para Doença , Transtorno Bipolar/genética , Índice de Massa Corporal , Encéfalo/metabolismo , Cromatina/genética , Epigênese Genética , Perfilação da Expressão Gênica/estatística & dados numéricos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Doenças do Sistema Imunitário/genética , Desequilíbrio de Ligação , Modelos Genéticos , Herança Multifatorial , Neurônios/metabolismo , Esquizofrenia/genética , Distribuição Tecidual/genética
12.
Nat Genet ; 50(4): 538-548, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29632383

RESUMO

Genome-wide association studies (GWAS) have identified over 100 risk loci for schizophrenia, but the causal mechanisms remain largely unknown. We performed a transcriptome-wide association study (TWAS) integrating a schizophrenia GWAS of 79,845 individuals from the Psychiatric Genomics Consortium with expression data from brain, blood, and adipose tissues across 3,693 primarily control individuals. We identified 157 TWAS-significant genes, of which 35 did not overlap a known GWAS locus. Of these 157 genes, 42 were associated with specific chromatin features measured in independent samples, thus highlighting potential regulatory targets for follow-up. Suppression of one identified susceptibility gene, mapk3, in zebrafish showed a significant effect on neurodevelopmental phenotypes. Expression and splicing from the brain captured most of the TWAS effect across all genes. This large-scale connection of associations to target genes, tissues, and regulatory features is an essential step in moving toward a mechanistic understanding of GWAS.


Assuntos
Cromatina/genética , Esquizofrenia/etiologia , Esquizofrenia/genética , Animais , Encéfalo/metabolismo , Dosagem de Genes , Perfilação da Expressão Gênica/métodos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Humanos , Cinesinas , Proteínas Associadas aos Microtúbulos/genética , Proteína Quinase 3 Ativada por Mitógeno/genética , Herança Multifatorial , Proteína Fosfatase 2/genética , Locos de Características Quantitativas , Peixe-Zebra/genética , Peixe-Zebra/crescimento & desenvolvimento , Proteínas de Peixe-Zebra/genética
13.
Genome Res ; 28(5): 739-750, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29588361

RESUMO

Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to predict cell-type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. By use of convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.


Assuntos
Cromossomos/genética , Biologia Computacional/métodos , Redes Neurais de Computação , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Epigenômica/métodos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Genômica/métodos , Humanos , Aprendizado de Máquina , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Regiões Promotoras Genéticas/genética
14.
Nat Genet ; 48(11): 1443-1448, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27694958

RESUMO

Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing in a genotyped cohort, an approach that can yield high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ∼20× speedup and ∼10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2× the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.


Assuntos
Algoritmos , Haplótipos , Estudos de Coortes , Feminino , Genótipo , Humanos , Masculino , Valores de Referência
15.
Nat Genet ; 47(11): 1228-35, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26414678

RESUMO

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.


Assuntos
Doença/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Simulação por Computador , Feminino , Frequência do Gene , Histonas/metabolismo , Humanos , Padrões de Herança , Lisina/metabolismo , Masculino , Metilação , Modelos Genéticos
17.
J Comput Biol ; 19(9): 998-1014, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22897201

RESUMO

Pedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the estimated pedigree to the true pedigree. In this article, we consider two main pedigree comparison problems. The first is the pedigree isomorphism problem, for which we present a linear-time algorithm for leaf-labeled pedigrees. The second is the pedigree edit distance problem, for which we present (1) several algorithms that are fast and exact in various special cases, and (2) a general, randomized heuristic algorithm. In the negative direction, we first prove that the pedigree isomorphism problem is as hard as the general graph isomorphism problem, and that the sub-pedigree isomorphism problem is NP-hard. We then show that the pedigree edit distance problem is APX-hard in general and NP-hard on leaf-labeled pedigrees. We use simulated pedigrees to compare our edit-distance algorithms to each other as well as to a branch-and-bound algorithm that always finds an optimal solution.


Assuntos
Algoritmos , Simulação por Computador , Modelos Genéticos , Linhagem , Inteligência Artificial , Humanos
18.
Science ; 334(6062): 1518-24, 2011 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-22174245

RESUMO

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.


Assuntos
Interpretação Estatística de Dados , Algoritmos , Animais , Beisebol/estatística & dados numéricos , Feminino , Expressão Gênica , Genes Fúngicos , Genômica/métodos , Humanos , Intestinos/microbiologia , Masculino , Metagenoma , Camundongos , Obesidade , Saccharomyces cerevisiae/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA