RESUMEN
The LRRK2 G2019S variant is the most common cause of monogenic Parkinson's disease (PD); however, questions remain regarding the penetrance, clinical phenotype and natural history of carriers. We performed a 3.5-year prospective longitudinal online study in a large number of 1286 genotyped LRRK2 G2019S carriers and 109 154 controls, with and without PD, recruited from the 23andMe Research Cohort. We collected self-reported motor and non-motor symptoms every 6 months, as well as demographics, family histories and environmental risk factors. Incident cases of PD (phenoconverters) were identified at follow-up. We determined lifetime risk of PD using accelerated failure time modelling and explored the impact of polygenic risk on penetrance. We also computed the genetic ancestry of all LRRK2 G2019S carriers in the 23andMe database and identified regions of the world where carrier frequencies are highest. We observed that despite a 1 year longer disease duration (P = 0.016), LRRK2 G2019S carriers with PD had similar burden of motor symptoms, yet significantly fewer non-motor symptoms including cognitive difficulties, REM sleep behaviour disorder (RBD) and hyposmia (all P-values ≤ 0.0002). The cumulative incidence of PD in G2019S carriers by age 80 was 49%. G2019S carriers had a 10-fold risk of developing PD versus non-carriers. This rose to a 27-fold risk in G2019S carriers with a PD polygenic risk score in the top 25% versus non-carriers in the bottom 25%. In addition to identifying ancient founding events in people of North African and Ashkenazi descent, our genetic ancestry analyses infer that the G2019S variant was later introduced to Spanish colonial territories in the Americas. Our results suggest LRRK2 G2019S PD appears to be a slowly progressive predominantly motor subtype of PD with a lower prevalence of hyposmia, RBD and cognitive impairment. This suggests that the current prodromal criteria, which are based on idiopathic PD, may lack sensitivity to detect the early phases of LRRK2 PD in G2019S carriers. We show that polygenic burden may contribute to the development of PD in the LRRK2 G2019S carrier population. Collectively, the results should help support screening programmes and candidate enrichment strategies for upcoming trials of LRRK2 inhibitors in early-stage disease.
Asunto(s)
Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina , Enfermedad de Parkinson , Humanos , Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina/genética , Enfermedad de Parkinson/genética , Femenino , Masculino , Persona de Mediana Edad , Anciano , Estudios Longitudinales , Predisposición Genética a la Enfermedad/genética , Adulto , Estudios Prospectivos , Heterocigoto , Penetrancia , Anciano de 80 o más Años , Trastorno de la Conducta del Sueño REM/genética , MutaciónRESUMEN
Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors, we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally, we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale data sets with millions of samples. Furthermore, we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis, exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for noncommercial use in the code repository (https://github.com/23andMe/phasedibd, last accessed January 11, 2021).
Asunto(s)
Genoma Humano , Haplotipos , Programas Informáticos , Algoritmos , Reacciones Falso Negativas , Reacciones Falso Positivas , Humanos , México , FilogeografíaRESUMEN
Sex-biased demographic events ("sex-bias") involve unequal numbers of females and males. These events are typically inferred from the relative amount of X-chromosomal to autosomal genetic variation and have led to conflicting conclusions about human demographic history. Though population size changes alter the relative amount of X-chromosomal to autosomal genetic diversity even in the absence of sex-bias, this has generally not been accounted for in sex-bias estimators to date. Here, we present a novel method to identify sex-bias from genetic sequence data that models population size changes and estimates the female fraction of the effective population size during each time epoch. Compared to recent sex-bias inference methods, our approach can detect sex-bias that changes on a single population branch without requiring data from an outgroup or knowledge of divergence events. When applied to simulated data, conventional sex-bias estimators are biased by population size changes, especially recent growth or bottlenecks, while our estimator is unbiased. We next apply our method to high-coverage exome data from the 1000 Genomes Project and estimate a male bias in Yorubans (47% female) and Europeans (44%), possibly due to stronger background selection on the X chromosome than on the autosomes. Finally, we apply our method to the 1000 Genomes Project Phase 3 high-coverage Complete Genomics whole-genome data and estimate a female bias in Yorubans (63% female), Europeans (84%), Punjabis (82%), as well as Peruvians (56%), and a male bias in the Southern Han Chinese (45%). Our method additionally identifies a male-biased migration out of Africa based on data from Europeans (20% female). Our results demonstrate that modeling population size change is necessary to estimate sex-bias parameters accurately. Our approach gives insight into signatures of sex-bias in sexual species, and the demographic models it produces can serve as more accurate null models for tests of selection.
Asunto(s)
Demografía/métodos , Genética de Población/métodos , Análisis de Secuencia de ADN/métodos , Sesgo , Cromosomas Humanos X/genética , Femenino , Variación Genética/genética , Genoma/genética , Humanos , Masculino , Modelos Genéticos , Densidad de Población , Selección Genética/genética , Secuenciación Completa del Genoma/métodosRESUMEN
We present a comprehensive assessment of genomic diversity in the African-American population by studying three genotyped cohorts comprising 3,726 African-Americans from across the United States that provide a representative description of the population across all US states and socioeconomic status. An estimated 82.1% of ancestors to African-Americans lived in Africa prior to the advent of transatlantic travel, 16.7% in Europe, and 1.2% in the Americas, with increased African ancestry in the southern United States compared to the North and West. Combining demographic models of ancestry and those of relatedness suggests that admixture occurred predominantly in the South prior to the Civil War and that ancestry-biased migration is responsible for regional differences in ancestry. We find that recent migrations also caused a strong increase in genetic relatedness among geographically distant African-Americans. Long-range relatedness among African-Americans and between African-Americans and European-Americans thus track north- and west-bound migration routes followed during the Great Migration of the twentieth century. By contrast, short-range relatedness patterns suggest comparable mobility of â¼15-16km per generation for African-Americans and European-Americans, as estimated using a novel analytical model of isolation-by-distance.
Asunto(s)
Negro o Afroamericano/genética , Genética de Población , Genómica , Población Negra/genética , Demografía , Europa (Continente) , Frecuencia de los Genes , Genotipo , Migración Humana , Humanos , Polimorfismo de Nucleótido Simple/genética , Estados UnidosRESUMEN
BACKGROUND: Medulloblastoma is associated with rare hereditary cancer predisposition syndromes; however, consensus medulloblastoma predisposition genes have not been defined and screening guidelines for genetic counselling and testing for paediatric patients are not available. We aimed to assess and define these genes to provide evidence for future screening guidelines. METHODS: In this international, multicentre study, we analysed patients with medulloblastoma from retrospective cohorts (International Cancer Genome Consortium [ICGC] PedBrain, Medulloblastoma Advanced Genomics International Consortium [MAGIC], and the CEFALO series) and from prospective cohorts from four clinical studies (SJMB03, SJMB12, SJYC07, and I-HIT-MED). Whole-genome sequences and exome sequences from blood and tumour samples were analysed for rare damaging germline mutations in cancer predisposition genes. DNA methylation profiling was done to determine consensus molecular subgroups: WNT (MBWNT), SHH (MBSHH), group 3 (MBGroup3), and group 4 (MBGroup4). Medulloblastoma predisposition genes were predicted on the basis of rare variant burden tests against controls without a cancer diagnosis from the Exome Aggregation Consortium (ExAC). Previously defined somatic mutational signatures were used to further classify medulloblastoma genomes into two groups, a clock-like group (signatures 1 and 5) and a homologous recombination repair deficiency-like group (signatures 3 and 8), and chromothripsis was investigated using previously established criteria. Progression-free survival and overall survival were modelled for patients with a genetic predisposition to medulloblastoma. FINDINGS: We included a total of 1022 patients with medulloblastoma from the retrospective cohorts (n=673) and the four prospective studies (n=349), from whom blood samples (n=1022) and tumour samples (n=800) were analysed for germline mutations in 110 cancer predisposition genes. In our rare variant burden analysis, we compared these against 53â105 sequenced controls from ExAC and identified APC, BRCA2, PALB2, PTCH1, SUFU, and TP53 as consensus medulloblastoma predisposition genes according to our rare variant burden analysis and estimated that germline mutations accounted for 6% of medulloblastoma diagnoses in the retrospective cohort. The prevalence of genetic predispositions differed between molecular subgroups in the retrospective cohort and was highest for patients in the MBSHH subgroup (20% in the retrospective cohort). These estimates were replicated in the prospective clinical cohort (germline mutations accounted for 5% of medulloblastoma diagnoses, with the highest prevalence [14%] in the MBSHH subgroup). Patients with germline APC mutations developed MBWNT and accounted for most (five [71%] of seven) cases of MBWNT that had no somatic CTNNB1 exon 3 mutations. Patients with germline mutations in SUFU and PTCH1 mostly developed infant MBSHH. Germline TP53 mutations presented only in childhood patients in the MBSHH subgroup and explained more than half (eight [57%] of 14) of all chromothripsis events in this subgroup. Germline mutations in PALB2 and BRCA2 were observed across the MBSHH, MBGroup3, and MBGroup4 molecular subgroups and were associated with mutational signatures typical of homologous recombination repair deficiency. In patients with a genetic predisposition to medulloblastoma, 5-year progression-free survival was 52% (95% CI 40-69) and 5-year overall survival was 65% (95% CI 52-81); these survival estimates differed significantly across patients with germline mutations in different medulloblastoma predisposition genes. INTERPRETATION: Genetic counselling and testing should be used as a standard-of-care procedure in patients with MBWNT and MBSHH because these patients have the highest prevalence of damaging germline mutations in known cancer predisposition genes. We propose criteria for routine genetic screening for patients with medulloblastoma based on clinical and molecular tumour characteristics. FUNDING: German Cancer Aid; German Federal Ministry of Education and Research; German Childhood Cancer Foundation (Deutsche Kinderkrebsstiftung); European Research Council; National Institutes of Health; Canadian Institutes for Health Research; German Cancer Research Center; St Jude Comprehensive Cancer Center; American Lebanese Syrian Associated Charities; Swiss National Science Foundation; European Molecular Biology Organization; Cancer Research UK; Hertie Foundation; Alexander and Margaret Stewart Trust; V Foundation for Cancer Research; Sontag Foundation; Musicians Against Childhood Cancer; BC Cancer Foundation; Swedish Council for Health, Working Life and Welfare; Swedish Research Council; Swedish Cancer Society; the Swedish Radiation Protection Authority; Danish Strategic Research Council; Swiss Federal Office of Public Health; Swiss Research Foundation on Mobile Communication; Masaryk University; Ministry of Health of the Czech Republic; Research Council of Norway; Genome Canada; Genome BC; Terry Fox Research Institute; Ontario Institute for Cancer Research; Pediatric Oncology Group of Ontario; The Family of Kathleen Lorette and the Clark H Smith Brain Tumour Centre; Montreal Children's Hospital Foundation; The Hospital for Sick Children: Sonia and Arthur Labatt Brain Tumour Research Centre, Chief of Research Fund, Cancer Genetics Program, Garron Family Cancer Centre, MDT's Garron Family Endowment; BC Childhood Cancer Parents Association; Cure Search Foundation; Pediatric Brain Tumor Foundation; Brainchild; and the Government of Ontario.
Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias Cerebelosas/genética , Metilación de ADN , Pruebas Genéticas/métodos , Mutación de Línea Germinal , Meduloblastoma/genética , Modelos Genéticos , Adolescente , Adulto , Neoplasias Cerebelosas/mortalidad , Neoplasias Cerebelosas/patología , Neoplasias Cerebelosas/terapia , Niño , Preescolar , Análisis Mutacional de ADN , Femenino , Perfilación de la Expresión Génica , Predisposición Genética a la Enfermedad , Herencia , Humanos , Lactante , Masculino , Meduloblastoma/mortalidad , Meduloblastoma/patología , Meduloblastoma/terapia , Linaje , Fenotipo , Valor Predictivo de las Pruebas , Supervivencia sin Progresión , Estudios Prospectivos , Reproducibilidad de los Resultados , Estudios Retrospectivos , Factores de Riesgo , Transcriptoma , Secuenciación del Exoma , Adulto JovenRESUMEN
The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries--such as "Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?"--with either "yes" or "no." Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.
Asunto(s)
Privacidad Genética , Variación Genética , Genoma Humano , Difusión de la Información/métodos , Análisis y Desempeño de Tareas , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , HumanosRESUMEN
Motivation: Variant calling from next-generation sequencing (NGS) data is susceptible to false positive calls due to sequencing, mapping and other errors. To better distinguish true from false positive calls, we present a method that uses genotype array data from the sequenced samples, rather than public data such as HapMap or dbSNP, to train an accurate classifier using Random Forests. We demonstrate our method on a set of variant calls obtained from 642 African-ancestry genomes from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA), sequenced to high depth (30X). Results: We have applied our classifier to compare call sets generated with different calling methods, including both single-sample and multi-sample callers. At a False Positive Rate of 5%, our method determines true positive rates of 97.5%, 95% and 99% on variant calls obtained using Illuminas single-sample caller CASAVA, Real Time Genomics multisample variant caller, and the GATK UnifiedGenotyper, respectively. Since NGS sequencing data may be accompanied by genotype data for the same samples, either collected concurrent to sequencing or from a previous study, our method can be trained on each dataset to provide a more accurate computational validation of site calls compared to generic methods. Moreover, our method allows for adjustment based on allele frequency (e.g. a different set of criteria to determine quality for rare versus common variants) and thereby provides insight into sequencing characteristics that indicate call quality for variants of different frequencies. Availability and Implementation: Code is available on Github at: https://github.com/suyashss/variant_validation. Contacts: suyashs@stanford.edu or mtaub@jhsph.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma/métodos , Exactitud de los Datos , Genoma Humano , Genómica/métodos , Genómica/normas , Genotipo , Técnicas de Genotipaje/métodos , Técnicas de Genotipaje/normas , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Secuenciación Completa del Genoma/normasRESUMEN
BACKGROUND: A number of large genomic datasets are being generated for studies of human ancestry and diseases. The ADMIXTURE program is commonly used to infer individual ancestry from genomic data. RESULTS: We describe two improvements to the ADMIXTURE software. The first enables ADMIXTURE to infer ancestry for a new set of individuals using cluster allele frequencies from a reference set of individuals. Using data from the 1000 Genomes Project, we show that this allows ADMIXTURE to infer ancestry for 10,920 individuals in a few hours (a 5 × speedup). This mode also allows ADMIXTURE to correctly estimate individual ancestry and allele frequencies from a set of related individuals. The second modification allows ADMIXTURE to correctly handle X-chromosome (and other haploid) data from both males and females. We demonstrate increased power to detect sex-biased admixture in African-American individuals from the 1000 Genomes project using this extension. CONCLUSIONS: These modifications make ADMIXTURE more efficient and versatile, allowing users to extract more information from large genomic datasets.
Asunto(s)
Genética de Población , Genómica/métodos , Programas Informáticos , Negro o Afroamericano/genética , Femenino , Frecuencia de los Genes , Proyecto Mapa de Haplotipos , Humanos , Masculino , Sudoeste de Estados UnidosRESUMEN
PNAS article classification is rooted in long-standing disciplinary divisions that do not necessarily reflect the structure of modern scientific research. We reevaluate that structure using latent pattern models from statistical machine learning, also known as mixed-membership models, that identify semantic structure in co-occurrence of words in the abstracts and references. Our findings suggest that the latent dimensionality of patterns underlying PNAS research articles in the Biological Sciences is only slightly larger than the number of categories currently in use, but it differs substantially in the content of the categories. Further, the number of articles that are listed under multiple categories is only a small fraction of what it should be. These findings together with the sensitivity analyses suggest ways to reconceptualize the organization of papers published in PNAS.
Asunto(s)
Publicaciones Periódicas como Asunto/clasificación , Publicaciones/clasificación , Clasificación , Métodos , National Academy of Sciences, U.S. , Estadística como Asunto , Estados UnidosRESUMEN
Atopic dermatitis (AD) is a common inflammatory skin condition and prior genome-wide association studies (GWAS) have identified 71 associated loci. In the current study we conducted the largest AD GWAS to date (discovery N = 1,086,394, replication N = 3,604,027), combining previously reported cohorts with additional available data. We identified 81 loci (29 novel) in the European-only analysis (which all replicated in a separate European analysis) and 10 additional loci in the multi-ancestry analysis (3 novel). Eight variants from the multi-ancestry analysis replicated in at least one of the populations tested (European, Latino or African), while two may be specific to individuals of Japanese ancestry. AD loci showed enrichment for DNAse I hypersensitivity and eQTL associations in blood. At each locus we prioritised candidate genes by integrating multi-omic data. The implicated genes are predominantly in immune pathways of relevance to atopic inflammation and some offer drug repurposing opportunities.
Asunto(s)
Dermatitis Atópica , Estudio de Asociación del Genoma Completo , Humanos , Dermatitis Atópica/genética , Predisposición Genética a la Enfermedad/genética , Hispánicos o Latinos/genética , Población Negra , Polimorfismo de Nucleótido SimpleRESUMEN
MOTIVATION: Clustering of genotype data is an important way of understanding similarities and differences between populations. A summary of populations through clustering allows us to make inferences about the evolutionary history of the populations. Many methods have been proposed to perform clustering on multilocus genotype data. However, most of these methods do not directly address the question of how many clusters the data should be divided into and leave that choice to the user. METHODS: We present StructHDP, which is a method for automatically inferring the number of clusters from genotype data in the presence of admixture. Our method is an extension of two existing methods, Structure and Structurama. Using a Hierarchical Dirichlet Process (HDP), we model the presence of admixture of an unknown number of ancestral populations in a given sample of genotype data. We use a Gibbs sampler to perform inference on the resulting model and infer the ancestry proportions and the number of clusters that best explain the data. RESULTS: To demonstrate our method, we simulated data from an island model using the neutral coalescent. Comparing the results of StructHDP with Structurama shows the utility of combining HDPs with the Structure model. We used StructHDP to analyze a dataset of 155 Taita thrush, Turdus helleri, which has been previously analyzed using Structure and Structurama. StructHDP correctly picks the optimal number of populations to cluster the data. The clustering based on the inferred ancestry proportions also agrees with that inferred using Structure for the optimal number of populations. We also analyzed data from 1048 individuals from the Human Genome Diversity project from 53 world populations. We found that the clusters obtained correspond with major geographical divisions of the world, which is in agreement with previous analyses of the dataset. AVAILABILITY: StructHDP is written in C++. The code will be available for download at http://www.sailing.cs.cmu.edu/structhdp. CONTACT: suyash@cs.cmu.edu; epxing@cs.cmu.edu.
Asunto(s)
Genética de Población , Modelos Genéticos , Pájaros Cantores/genética , Animales , Análisis por Conglomerados , Genotipo , Humanos , Grupos de PoblaciónRESUMEN
A substantial proportion of the adult United States population with type 2 diabetes (T2D) are undiagnosed, calling into question the comprehensiveness of current screening practices, which primarily rely on age, family history, and body mass index (BMI). We hypothesized that a polygenic score (PGS) may serve as a complementary tool to identify high-risk individuals. The T2D polygenic score maintained predictive utility after adjusting for family history and combining genetics with family history led to even more improved disease risk prediction. We observed that the PGS was meaningfully related to age of onset with implications for screening practices: there was a linear and statistically significant relationship between the PGS and T2D onset (-1.3 years per standard deviation of the PGS). Evaluation of U.S. Preventive Task Force and a simplified version of American Diabetes Association screening guidelines showed that addition of a screening criterion for those above the 90th percentile of the PGS provided a small increase the sensitivity of the screening algorithm. Among T2D-negative individuals, the T2D PGS was associated with prediabetes, where each standard deviation increase of the PGS was associated with a 23% increase in the odds of prediabetes diagnosis. Additionally, each standard deviation increase in the PGS corresponded to a 43% increase in the odds of incident T2D at one-year follow-up. Using complications and forms of clinical intervention (i.e., lifestyle modification, metformin treatment, or insulin treatment) as proxies for advanced illness we also found statistically significant associations between the T2D PGS and insulin treatment and diabetic neuropathy. Importantly, we were able to replicate many findings in a Hispanic/Latino cohort from our database, highlighting the value of the T2D PGS as a clinical tool for individuals with ancestry other than European. In this group, the T2D PGS provided additional disease risk information beyond that offered by traditional screening methodologies. The T2D PGS also had predictive value for the age of onset and for prediabetes among T2D-negative Hispanic/Latino participants. These findings strengthen the notion that a T2D PGS could play a role in the clinical setting across multiple ancestries, potentially improving T2D screening practices, risk stratification, and disease management.
RESUMEN
There is currently a dearth of accessible whole genome sequencing (WGS) data for individuals residing in the Americas with Sub-Saharan African ancestry. We generated whole genome sequencing data at intermediate (15×) coverage for 2,294 individuals with large amounts of Sub-Saharan African ancestry, predominantly Atlantic African admixed with varying amounts of European and American ancestry. We performed extensive comparisons of variant callers, phasing algorithms, and variant filtration on these data to construct a high quality imputation panel containing data from 2,269 unrelated individuals. With the exception of the TOPMed imputation server (which notably cannot be downloaded), our panel substantially outperformed other available panels when imputing African American individuals. The raw sequencing data, variant calls and imputation panel for this cohort are all freely available via dbGaP and should prove an invaluable resource for further study of admixed African genetics.
Asunto(s)
Genoma Humano , Genotipo , Adulto , Negro o Afroamericano , Anciano , Anciano de 80 o más Años , Humanos , Persona de Mediana Edad , Estados Unidos , Secuenciación Completa del Genoma , Adulto JovenRESUMEN
Depression and anxiety are highly prevalent and comorbid psychiatric traits that cause considerable burden worldwide. Here we use factor analysis and genomic structural equation modelling to investigate the genetic factor structure underlying 28 items assessing depression, anxiety and neuroticism, a closely related personality trait. Symptoms of depression and anxiety loaded on two distinct, although highly genetically correlated factors, and neuroticism items were partitioned between them. We used this factor structure to conduct genome-wide association analyses on latent factors of depressive symptoms (89 independent variants, 61 genomic loci) and anxiety symptoms (102 variants, 73 loci) in the UK Biobank. Of these associated variants, 72% and 78%, respectively, replicated in an independent cohort of approximately 1.9 million individuals with self-reported diagnosis of depression and anxiety. We use these results to characterize shared and trait-specific genetic associations. Our findings provide insight into the genetic architecture of depression and anxiety and comorbidity between them.
Asunto(s)
Ansiedad , Síntomas Conductuales , Depresión , Neuroticismo/fisiología , Ansiedad/diagnóstico , Ansiedad/epidemiología , Ansiedad/genética , Síntomas Conductuales/diagnóstico , Síntomas Conductuales/psicología , Comorbilidad , Depresión/diagnóstico , Depresión/epidemiología , Depresión/genética , Análisis Factorial , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Análisis de Clases Latentes , Evaluación de Síntomas/métodos , Evaluación de Síntomas/estadística & datos numéricosRESUMEN
Major depressive disorder is the most common neuropsychiatric disorder, affecting 11% of veterans. Here we report results of a large meta-analysis of depression using data from the Million Veteran Program, 23andMe, UK Biobank and FinnGen, including individuals of European ancestry (n = 1,154,267; 340,591 cases) and African ancestry (n = 59,600; 25,843 cases). Transcriptome-wide association study analyses revealed significant associations with expression of NEGR1 in the hypothalamus and DRD2 in the nucleus accumbens, among others. We fine-mapped 178 genomic risk loci, and we identified likely pathogenicity in these variants and overlapping gene expression for 17 genes from our transcriptome-wide association study, including TRAF3. Finally, we were able to show substantial replications of our findings in a large independent cohort (n = 1,342,778) provided by 23andMe. This study sheds light on the genetic architecture of depression and provides new insight into the interrelatedness of complex psychiatric traits.
Asunto(s)
Trastorno Depresivo Mayor/genética , Predisposición Genética a la Enfermedad/genética , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , VeteranosRESUMEN
Irritable bowel syndrome (IBS) results from disordered brain-gut interactions. Identifying susceptibility genes could highlight the underlying pathophysiological mechanisms. We designed a digestive health questionnaire for UK Biobank and combined identified cases with IBS with independent cohorts. We conducted a genome-wide association study with 53,400 cases and 433,201 controls and replicated significant associations in a 23andMe panel (205,252 cases and 1,384,055 controls). Our study identified and confirmed six genetic susceptibility loci for IBS. Implicated genes included NCAM1, CADM2, PHF2/FAM120A, DOCK9, CKAP2/TPTE2P3 and BAG6. The first four are associated with mood and anxiety disorders, expressed in the nervous system, or both. Mirroring this, we also found strong genome-wide correlation between the risk of IBS and anxiety, neuroticism and depression (rg > 0.5). Additional analyses suggested this arises due to shared pathogenic pathways rather than, for example, anxiety causing abdominal symptoms. Implicated mechanisms require further exploration to help understand the altered brain-gut interactions underlying IBS.
Asunto(s)
Trastornos de Ansiedad/genética , Síndrome del Colon Irritable/genética , Trastornos del Humor/genética , Anciano , Antígeno CD56/genética , Moléculas de Adhesión Celular/genética , Proteínas del Citoesqueleto/genética , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Factores de Intercambio de Guanina Nucleótido/genética , Proteínas de Homeodominio/genética , Humanos , Síndrome del Colon Irritable/epidemiología , Masculino , Persona de Mediana Edad , Chaperonas Moleculares/genética , Polimorfismo de Nucleótido Simple , Reino Unido/epidemiologíaRESUMEN
Traditional methods for analyzing population structure, such as the Structure program, ignore the influence of the effect of allele mutations between the ancestral and current alleles of genetic markers, which can dramatically influence the accuracy of the structural estimation of current populations. Studying these effects can also reveal additional information about population evolution such as the divergence time and migration history of admixed populations. We propose mStruct, an admixture of population-specific mixtures of inheritance models that addresses the task of structure inference and mutation estimation jointly through a hierarchical Bayesian framework, and a variational algorithm for inference. We validated our method on synthetic data and used it to analyze the Human Genome Diversity Project-Centre d'Etude du Polymorphisme Humain (HGDP-CEPH) cell line panel of microsatellites and HGDP single-nucleotide polymorphism (SNP) data. A comparison of the structural maps of world populations estimated by mStruct and Structure is presented, and we also report potentially interesting mutation patterns in world populations estimated by mStruct.
Asunto(s)
Alelos , Biología Computacional/métodos , Modelos Genéticos , Mutación/genética , Programas Informáticos , Algoritmos , Proyecto Genoma Humano , Humanos , Repeticiones de Microsatélite/genética , Filogenia , Polimorfismo de Nucleótido Simple/genética , Grupos Raciales/genética , Reproducibilidad de los ResultadosRESUMEN
Population structure is a commonplace feature of genetic variation data, and it has importance in numerous application areas, including evolutionary genetics, conservation genetics, and human genetics. Understanding the structure in a sample is necessary before more sophisticated analyses are undertaken. Here we provide a protocol for running principal component analysis (PCA) and admixture proportion inference-two of the most commonly used approaches in describing population structure. Along with hands-on examples with CEPH-Human Genome Diversity Panel and pragmatic caveats, readers will learn to analyze and visualize population structure on their own data.
Asunto(s)
Genética de Población/métodos , Polimorfismo de Nucleótido Simple , Biología Computacional , Genoma Humano , Humanos , Modelos Genéticos , Análisis de Componente PrincipalRESUMEN
Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution. Conventional probabilistic phylogenetic shadowing methods model the evolution of genomes only at nucleotide level, and lack the ability to capture the evolutionary dynamics of functional turnover of aligned sequence entities. As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles. We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a context-dependent probabilistic graphical model that allows aligned sites from different taxa in a multiple alignment to be modeled by either a background or an appropriate motif phylogeny conditioning on the functional specifications of each taxon. The functional specifications themselves are the output of a phylogeny which models the evolution not of individual nucleotides, but of the overall functionality (e.g., functional retention or loss) of the aligned sequence segments over lineages. Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders.
Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Evolución Molecular , Modelos Genéticos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Factores de Transcripción/genética , Secuencias de Aminoácidos , Sitios de Unión , Simulación por Computador , Variación Genética/genética , Filogenia , Unión ProteicaRESUMEN
The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r2 at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.