Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
2.
Nat Biotechnol ; 40(3): 355-363, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-34675423

RESUMEN

As single-cell datasets grow in sample size, there is a critical need to characterize cell states that vary across samples and associate with sample attributes, such as clinical phenotypes. Current statistical approaches typically map cells to clusters and then assess differences in cluster abundance. Here we present co-varying neighborhood analysis (CNA), an unbiased method to identify associated cell populations with greater flexibility than cluster-based approaches. CNA characterizes dominant axes of variation across samples by identifying groups of small regions in transcriptional space-termed neighborhoods-that co-vary in abundance across samples, suggesting shared function or regulation. CNA performs statistical testing for associations between any sample-level attribute and the abundances of these co-varying neighborhood groups. Simulations show that CNA enables more sensitive and accurate identification of disease-associated cell states than a cluster-based approach. When applied to published datasets, CNA captures a Notch activation signature in rheumatoid arthritis, identifies monocyte populations expanded in sepsis and identifies a novel T cell population associated with progression to active tuberculosis.


Asunto(s)
Linfocitos T , Transcriptoma , Análisis por Conglomerados , Fenotipo , Transcriptoma/genética
3.
Bioinformatics ; 37(15): 2103-2111, 2021 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-33532840

RESUMEN

MOTIVATION: Genome-wide association studies (GWASs) have identified thousands of common trait-associated genetic variants but interpretation of their function remains challenging. These genetic variants can overlap the binding sites of transcription factors (TFs) and therefore could alter gene expression. However, we currently lack a systematic understanding on how this mechanism contributes to phenotype. RESULTS: We present Motif-Raptor, a TF-centric computational tool that integrates sequence-based predictive models, chromatin accessibility, gene expression datasets and GWAS summary statistics to systematically investigate how TF function is affected by genetic variants. Given trait-associated non-coding variants, Motif-Raptor can recover relevant cell types and critical TFs to drive hypotheses regarding their mechanism of action. We tested Motif-Raptor on complex traits such as rheumatoid arthritis and red blood cell count and demonstrated its ability to prioritize relevant cell types, potential regulatory TFs and non-coding SNPs which have been previously characterized and validated. AVAILABILITY AND IMPLEMENTATION: Motif-Raptor is freely available as a Python package at: https://github.com/pinellolab/MotifRaptor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Nat Genet ; 52(12): 1355-1363, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33199916

RESUMEN

Fine-mapping aims to identify causal variants impacting complex traits. We propose PolyFun, a computationally scalable framework to improve fine-mapping accuracy by leveraging functional annotations across the entire genome-not just genome-wide-significant loci-to specify prior probabilities for fine-mapping methods such as SuSiE or FINEMAP. In simulations, PolyFun + SuSiE and PolyFun + FINEMAP were well calibrated and identified >20% more variants with a posterior causal probability >0.95 than identified in their nonfunctionally informed counterparts. In analyses of 49 UK Biobank traits (average n = 318,000), PolyFun + SuSiE identified 3,025 fine-mapped variant-trait pairs with posterior causal probability >0.95, a >32% improvement versus SuSiE. We used posterior mean per-SNP heritabilities from PolyFun + SuSiE to perform polygenic localization, constructing minimal sets of common SNPs causally explaining 50% of common SNP heritability; these sets ranged in size from 28 (hair color) to 3,400 (height) to 2 million (number of children). In conclusion, PolyFun prioritizes variants for functional follow-up and provides insights into complex trait architectures.


Asunto(s)
Mapeo Cromosómico/métodos , Biología Computacional/métodos , Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial/genética , Genoma Humano/genética , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética
5.
Hum Mol Genet ; 29(7): 1057-1067, 2020 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-31595288

RESUMEN

Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10-14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.


Asunto(s)
Cromatina/genética , Enfermedades Genéticas Congénitas/genética , Anotación de Secuencia Molecular , Factores de Transcripción/genética , Sitios de Unión/genética , Biología Computacional , Regulación de la Expresión Génica/genética , Enfermedades Genéticas Congénitas/clasificación , Enfermedades Genéticas Congénitas/patología , Humanos , Desequilibrio de Ligamiento/genética , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Unión Proteica/genética
6.
J Med Internet Res ; 21(9): e13766, 2019 09 12.
Artículo en Inglés | MEDLINE | ID: mdl-31516124

RESUMEN

BACKGROUND: The structure of the sexual networks and partnership characteristics of young black men who have sex with men (MSM) may be contributing to their high risk of contracting HIV in the United States. Assortative mixing, which refers to the tendency of individuals to have partners from one's own group, has been proposed as a potential explanation for disparities. OBJECTIVE: The objective of this study was to identify the age- and race-related search patterns of users of a diverse geosocial networking mobile app in seven metropolitan areas in the United States to understand the disparities in sexually transmitted infection and HIV risk in MSM communities. METHODS: Data were collected on user behavior between November 2015 and May 2016. Data pertaining to behavior on the app were collected for men who had searched for partners with at least one search parameter narrowed from defaults or used the app to send at least one private chat message and used the app at least once during the study period. Newman assortativity coefficient (R) was calculated from the study data to understand assortativity patterns of men by race. Pearson correlation coefficient was used to assess assortativity patterns by age. Heat maps were used to visualize the relationship between searcher's and candidate's characteristics by age band, race, or age band and race. RESULTS: From November 2015 through May 2016, there were 2,989,737 searches in all seven metropolitan areas among 122,417 searchers. Assortativity by age was important for looking at the profiles of candidates with correlation coefficients ranging from 0.284 (Birmingham) to 0.523 (San Francisco). Men tended to look at the profiles of candidates that matched their race in a highly assortative manner with R ranging from 0.310 (Birmingham) to 0.566 (Los Angeles). For the initiation of chats, race appeared to be slightly assortative for some groups with R ranging from 0.023 (Birmingham) to 0.305 (Los Angeles). Asian searchers were most assortative in initiating chats with Asian candidates in Boston, Los Angeles, New York, and San Francisco. In Birmingham and Tampa, searchers from all races tended to initiate chats with black candidates. CONCLUSIONS: Our results indicate that the age preferences of MSM are relatively consistent across cities, that is, younger MSM are more likely to be chatted with and have their profiles viewed compared with older MSM, but the patterns of racial mixing are more variable. Although some generalizations can be made regarding Web-based behaviors across all cities, city-specific usage patterns and trends should be analyzed to create targeted and localized interventions that may make the most difference in the lives of MSM in these areas.


Asunto(s)
Infecciones por VIH/prevención & control , Aplicaciones Móviles , Conducta Sexual , Parejas Sexuales , Enfermedades de Transmisión Sexual/prevención & control , Red Social , Adolescente , Adulto , Negro o Afroamericano , Ciudades , Infecciones por VIH/transmisión , Promoción de la Salud , Homosexualidad Masculina , Humanos , Masculino , Minorías Sexuales y de Género , Enfermedades de Transmisión Sexual/transmisión , Estados Unidos , Población Urbana , Adulto Joven
7.
Genet Epidemiol ; 43(2): 180-188, 2019 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30474154

RESUMEN

Recent studies have examined the genetic correlations of single-nucleotide polymorphism (SNP) effect sizes across pairs of populations to better understand the genetic architectures of complex traits. These studies have estimated ρ g , the cross-population correlation of joint-fit effect sizes at genotyped SNPs. However, the value of ρ g depends both on the cross-population correlation of true causal effect sizes ( ρ b ) and on the similarity in linkage disequilibrium (LD) patterns in the two populations, which drive tagging effects. Here, we derive the value of the ratio ρ g / ρ b as a function of LD in each population. By applying existing methods to obtain estimates of ρ g , we can use this ratio to estimate ρ b . Our estimates of ρ b were equal to 0.55 ( SE = 0.14) between Europeans and East Asians averaged across nine traits in the Genetic Epidemiology Research on Adult Health and Aging data set, 0.54 ( SE = 0.18) between Europeans and South Asians averaged across 13 traits in the UK Biobank data set, and 0.48 ( SE = 0.06) and 0.65 ( SE = 0.09) between Europeans and East Asians in summary statistic data sets for type 2 diabetes and rheumatoid arthritis, respectively. These results implicate substantially different causal genetic architectures across continental populations.


Asunto(s)
Genética de Población , Adulto , Envejecimiento/genética , Artritis Reumatoide/genética , Bancos de Muestras Biológicas , Bases de Datos Genéticas , Diabetes Mellitus Tipo 2/genética , Genotipo , Humanos , Fenotipo , Carácter Cuantitativo Heredable , Reino Unido
8.
Nat Genet ; 50(10): 1483-1493, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30177862

RESUMEN

Biological interpretation of genome-wide association study data frequently involves assessing whether SNPs linked to a biological process, for example, binding of a transcription factor, show unsigned enrichment for disease signal. However, signed annotations quantifying whether each SNP allele promotes or hinders the biological process can enable stronger statements about disease mechanism. We introduce a method, signed linkage disequilibrium profile regression, for detecting genome-wide directional effects of signed functional annotations on disease risk. We validate the method via simulations and application to molecular quantitative trait loci in blood, recovering known transcriptional regulators. We apply the method to expression quantitative trait loci in 48 Genotype-Tissue Expression tissues, identifying 651 transcription factor-tissue associations including 30 with robust evidence of tissue specificity. We apply the method to 46 diseases and complex traits (average n = 290 K), identifying 77 annotation-trait associations representing 12 independent transcription factor-trait associations, and characterize the underlying transcriptional programs using gene-set enrichment analyses. Our results implicate new causal disease genes and new disease mechanisms.


Asunto(s)
Enfermedad/genética , Estudio de Asociación del Genoma Completo , Herencia Multifactorial/genética , Sitios de Carácter Cuantitativo , Factores de Transcripción/metabolismo , Sitios de Unión/genética , Células Sanguíneas/metabolismo , Células Sanguíneas/patología , Análisis Químico de la Sangre , Regulación de la Expresión Génica , Predisposición Genética a la Enfermedad , Humanos , Desequilibrio de Ligamiento , Fenotipo , Polimorfismo de Nucleótido Simple , Unión Proteica , Factores de Riesgo
9.
Nature ; 559(7714): 350-355, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29995854

RESUMEN

The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered in blood-derived DNA from 151,202 UK Biobank participants using phase-based computational techniques (estimated false discovery rate, 6-9%). We found six loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At three such loci (MPL, TM2D3-TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance (5-50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide range of effects on human health.


Asunto(s)
Aberraciones Cromosómicas , Células Clonales/citología , Células Clonales/metabolismo , Hematopoyesis/genética , Mosaicismo , Adulto , Anciano , Alelos , Bancos de Muestras Biológicas , Rotura Cromosómica , Sitios Frágiles del Cromosoma/genética , Cromosomas Humanos Par 10/genética , Femenino , Salud , Neoplasias Hematológicas/genética , Neoplasias Hematológicas/mortalidad , Humanos , Masculino , Persona de Mediana Edad , Penetrancia , Reino Unido
10.
Nat Genet ; 50(7): 1041-1047, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29942083

RESUMEN

There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84× for eQTLs; P = 1.19 × 10-31) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80× for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06×; P = 1.20 × 10-35). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures.


Asunto(s)
Enfermedad/genética , Herencia Multifactorial , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo/métodos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable
11.
Nat Genet ; 50(4): 621-629, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29632380

RESUMEN

We introduce an approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics. Our approach uses stratified linkage disequilibrium (LD) score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We applied our approach to gene expression data from several sources together with GWAS summary statistics for 48 diseases and traits (average N = 169,331) and found significant tissue-specific enrichments (false discovery rate (FDR) < 5%) for 34 traits. In our analysis of multiple tissues, we detected a broad range of enrichments that recapitulated known biology. In our brain-specific analysis, significant enrichments included an enrichment of inhibitory over excitatory neurons for bipolar disorder, and excitatory over inhibitory neurons for schizophrenia and body mass index. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signals.


Asunto(s)
Expresión Génica , Predisposición Genética a la Enfermedad , Trastorno Bipolar/genética , Índice de Masa Corporal , Encéfalo/metabolismo , Cromatina/genética , Epigénesis Genética , Perfilación de la Expresión Génica/estadística & datos numéricos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Enfermedades del Sistema Inmune/genética , Desequilibrio de Ligamiento , Modelos Genéticos , Herencia Multifactorial , Neuronas/metabolismo , Esquizofrenia/genética , Distribución Tisular/genética
12.
Nat Genet ; 50(4): 538-548, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29632383

RESUMEN

Genome-wide association studies (GWAS) have identified over 100 risk loci for schizophrenia, but the causal mechanisms remain largely unknown. We performed a transcriptome-wide association study (TWAS) integrating a schizophrenia GWAS of 79,845 individuals from the Psychiatric Genomics Consortium with expression data from brain, blood, and adipose tissues across 3,693 primarily control individuals. We identified 157 TWAS-significant genes, of which 35 did not overlap a known GWAS locus. Of these 157 genes, 42 were associated with specific chromatin features measured in independent samples, thus highlighting potential regulatory targets for follow-up. Suppression of one identified susceptibility gene, mapk3, in zebrafish showed a significant effect on neurodevelopmental phenotypes. Expression and splicing from the brain captured most of the TWAS effect across all genes. This large-scale connection of associations to target genes, tissues, and regulatory features is an essential step in moving toward a mechanistic understanding of GWAS.


Asunto(s)
Cromatina/genética , Esquizofrenia/etiología , Esquizofrenia/genética , Animales , Encéfalo/metabolismo , Dosificación de Gen , Perfilación de la Expresión Génica/métodos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Humanos , Cinesinas , Proteínas Asociadas a Microtúbulos/genética , Proteína Quinasa 3 Activada por Mitógenos/genética , Herencia Multifactorial , Proteína Fosfatasa 2/genética , Sitios de Carácter Cuantitativo , Pez Cebra/genética , Pez Cebra/crecimiento & desarrollo , Proteínas de Pez Cebra/genética
13.
Genome Res ; 28(5): 739-750, 2018 05.
Artículo en Inglés | MEDLINE | ID: mdl-29588361

RESUMEN

Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to predict cell-type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. By use of convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.


Asunto(s)
Cromosomas/genética , Biología Computacional/métodos , Redes Neurales de la Computación , Secuencias Reguladoras de Ácidos Nucleicos/genética , Animales , Epigenómica/métodos , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Genómica/métodos , Humanos , Aprendizaje Automático , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas/genética
14.
Nat Genet ; 48(11): 1443-1448, 2016 11.
Artículo en Inglés | MEDLINE | ID: mdl-27694958

RESUMEN

Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing in a genotyped cohort, an approach that can yield high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ∼20× speedup and ∼10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2× the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.


Asunto(s)
Algoritmos , Haplotipos , Estudios de Cohortes , Femenino , Genotipo , Humanos , Masculino , Valores de Referencia
15.
Nat Genet ; 47(11): 1228-35, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26414678

RESUMEN

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.


Asunto(s)
Enfermedad/genética , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Algoritmos , Simulación por Computador , Femenino , Frecuencia de los Genes , Histonas/metabolismo , Humanos , Patrón de Herencia , Lisina/metabolismo , Masculino , Metilación , Modelos Genéticos
17.
J Comput Biol ; 19(9): 998-1014, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22897201

RESUMEN

Pedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the estimated pedigree to the true pedigree. In this article, we consider two main pedigree comparison problems. The first is the pedigree isomorphism problem, for which we present a linear-time algorithm for leaf-labeled pedigrees. The second is the pedigree edit distance problem, for which we present (1) several algorithms that are fast and exact in various special cases, and (2) a general, randomized heuristic algorithm. In the negative direction, we first prove that the pedigree isomorphism problem is as hard as the general graph isomorphism problem, and that the sub-pedigree isomorphism problem is NP-hard. We then show that the pedigree edit distance problem is APX-hard in general and NP-hard on leaf-labeled pedigrees. We use simulated pedigrees to compare our edit-distance algorithms to each other as well as to a branch-and-bound algorithm that always finds an optimal solution.


Asunto(s)
Algoritmos , Simulación por Computador , Modelos Genéticos , Linaje , Inteligencia Artificial , Humanos
18.
Science ; 334(6062): 1518-24, 2011 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-22174245

RESUMEN

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.


Asunto(s)
Interpretación Estadística de Datos , Algoritmos , Animales , Béisbol/estadística & datos numéricos , Femenino , Expresión Génica , Genes Fúngicos , Genómica/métodos , Humanos , Intestinos/microbiología , Masculino , Metagenoma , Ratones , Obesidad , Saccharomyces cerevisiae/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...