RESUMEN
High-throughput proteomics platforms measuring thousands of proteins in plasma combined with genomic and phenotypic information have the power to bridge the gap between the genome and diseases. Here we performed association studies of Olink Explore 3072 data generated by the UK Biobank Pharma Proteomics Project1 on plasma samples from more than 50,000 UK Biobank participants with phenotypic and genotypic data, stratifying on British or Irish, African and South Asian ancestries. We compared the results with those of a SomaScan v4 study on plasma from 36,000 Icelandic people2, for 1,514 of whom Olink data were also available. We found modest correlation between the two platforms. Although cis protein quantitative trait loci were detected for a similar absolute number of assays on the two platforms (2,101 on Olink versus 2,120 on SomaScan), the proportion of assays with such supporting evidence for assay performance was higher on the Olink platform (72% versus 43%). A considerable number of proteins had genomic associations that differed between the platforms. We provide examples where differences between platforms may influence conclusions drawn from the integration of protein levels with the study of diseases. We demonstrate how leveraging the diverse ancestries of participants in the UK Biobank helps to detect novel associations and refine genomic location. Our results show the value of the information provided by the two most commonly used high-throughput proteomics platforms and demonstrate the differences between them that at times provides useful complementarity.
Asunto(s)
Proteínas Sanguíneas , Susceptibilidad a Enfermedades , Genómica , Genotipo , Fenotipo , Proteómica , Humanos , África/etnología , Sur de Asia/etnología , Bancos de Muestras Biológicas , Proteínas Sanguíneas/análisis , Proteínas Sanguíneas/genética , Conjuntos de Datos como Asunto , Genoma Humano/genética , Islandia/etnología , Irlanda/etnología , Plasma/química , Proteoma/análisis , Proteoma/genética , Proteómica/métodos , Sitios de Carácter Cuantitativo , Reino UnidoRESUMEN
Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data1,2. Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank3. This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation.
Asunto(s)
Bancos de Muestras Biológicas , Bases de Datos Genéticas , Variación Genética , Genoma Humano , Genómica , Secuenciación Completa del Genoma , África/etnología , Asia/etnología , Estudios de Cohortes , Secuencia Conservada , Exones/genética , Genoma Humano/genética , Haplotipos/genética , Humanos , Mutación INDEL , Irlanda/etnología , Repeticiones de Microsatélite , Polimorfismo de Nucleótido Simple/genética , Reino UnidoRESUMEN
Speciation is a continuous process during which genetic changes gradually accumulate in the genomes of diverging species. Recent studies have documented highly heterogeneous differentiation landscapes, with distinct regions of elevated differentiation ("differentiation islands") widespread across genomes. However, it remains unclear which processes drive the evolution of differentiation islands; how the differentiation landscape evolves as speciation advances; and ultimately, how differentiation islands are related to speciation. Here, we addressed these questions based on population genetic analyses of 200 resequenced genomes from 10 populations of four Ficedula flycatcher sister species. We show that a heterogeneous differentiation landscape starts emerging among populations within species, and differentiation islands evolve recurrently in the very same genomic regions among independent lineages. Contrary to expectations from models that interpret differentiation islands as genomic regions involved in reproductive isolation that are shielded from gene flow, patterns of sequence divergence (d(xy) and relative node depth) do not support a major role of gene flow in the evolution of the differentiation landscape in these species. Instead, as predicted by models of linked selection, genome-wide variation in diversity and differentiation can be explained by variation in recombination rate and the density of targets for selection. We thus conclude that the heterogeneous landscape of differentiation in Ficedula flycatchers evolves mainly as the result of background selection and selective sweeps in genomic regions of low recombination. Our results emphasize the necessity of incorporating linked selection as a null model to identify genome regions involved in adaptation and speciation.
Asunto(s)
Especiación Genética , Passeriformes/clasificación , Passeriformes/genética , Recombinación Genética , Selección Genética , Animales , Femenino , Flujo Génico , Genética de Población , Genoma , Genómica , Técnicas de Genotipaje , Masculino , Polimorfismo de Nucleótido Simple , Aislamiento Reproductivo , Análisis de Secuencia de ADN , Especificidad de la EspecieRESUMEN
Unravelling the genomic landscape of divergence between lineages is key to understanding speciation. The naturally hybridizing collared flycatcher and pied flycatcher are important avian speciation models that show pre- as well as postzygotic isolation. We sequenced and assembled the 1.1-Gb flycatcher genome, physically mapped the assembly to chromosomes using a low-density linkage map and re-sequenced population samples of each species. Here we show that the genomic landscape of species differentiation is highly heterogeneous with approximately 50 'divergence islands' showing up to 50-fold higher sequence divergence than the genomic background. These non-randomly distributed islands, with between one and three regions of elevated divergence per chromosome irrespective of chromosome size, are characterized by reduced levels of nucleotide diversity, skewed allele-frequency spectra, elevated levels of linkage disequilibrium and reduced proportions of shared polymorphisms in both species, indicative of parallel episodes of selection. Proximity of divergence peaks to genomic regions resistant to sequence assembly, potentially including centromeres and telomeres, indicate that complex repeat structures may drive species divergence. A much higher background level of species divergence of the Z chromosome, and a lower proportion of shared polymorphisms, indicate that sex chromosomes and autosomes are at different stages of speciation. This study provides a roadmap to the emerging field of speciation genomics.
Asunto(s)
Especiación Genética , Genoma/genética , Pájaros Cantores/genética , Animales , Biodiversidad , Centrómero/genética , Cromosomas/genética , Frecuencia de los Genes , Variación Genética , Genómica , Masculino , Datos de Secuencia Molecular , Filogenia , Selección Genética/genética , Pájaros Cantores/clasificación , Especificidad de la Especie , Telómero/genéticaRESUMEN
Uncertainty about the phase of strings of SNPs creates complications in genetic analysis, although methods have been developed for phasing population-based samples. However, these methods can only phase a small number of SNPs effectively and become unreliable when applied to SNPs spanning many linkage disequilibrium (LD) blocks. Here we show how to phase more than 1,000 SNPs simultaneously for a large fraction of the 35,528 Icelanders genotyped by Illumina chips. Moreover, haplotypes that are identical by descent (IBD) between close and distant relatives, for example, those separated by ten meioses or more, can often be reliably detected. This method is particularly powerful in studies of the inheritance of recurrent mutations and fine-scale recombinations in large sample sets. A further extension of the method allows us to impute long haplotypes for individuals who are not genotyped.
Asunto(s)
Algoritmos , Haplotipos , Complejo Mayor de Histocompatibilidad , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Secuencia de Bases , Femenino , Eliminación de Gen , Marcadores Genéticos , Genética de Población , Humanos , Islandia , Patrón de Herencia , MasculinoRESUMEN
Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000-80,000) and census sizes (5-50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to unravel tangled demographic histories. Moreover, it constitutes one of the first examples of the inference of divergence history from genome-wide data in non-model species.
Asunto(s)
Evolución Biológica , Especiación Genética , Selección Genética , Pájaros Cantores/genética , Animales , Teorema de Bayes , Flujo Génico , Variación Genética , Cadenas de Markov , Densidad de Población , Análisis de Secuencia de ADNRESUMEN
Effects of susceptibility variants may depend on from which parent they are inherited. Although many associations between sequence variants and human traits have been discovered through genome-wide associations, the impact of parental origin has largely been ignored. Here we show that for 38,167 Icelanders genotyped using single nucleotide polymorphism (SNP) chips, the parental origin of most alleles can be determined. For this we used a combination of genealogy and long-range phasing. We then focused on SNPs that associate with diseases and are within 500 kilobases of known imprinted genes. Seven independent SNP associations were examined. Five-one with breast cancer, one with basal-cell carcinoma and three with type 2 diabetes-have parental-origin-specific associations. These variants are located in two genomic regions, 11p15 and 7q32, each harbouring a cluster of imprinted genes. Furthermore, we observed a novel association between the SNP rs2334499 at 11p15 and type 2 diabetes. Here the allele that confers risk when paternally inherited is protective when maternally transmitted. We identified a differentially methylated CTCF-binding site at 11p15 and demonstrated correlation of rs2334499 with decreased methylation of that site.
Asunto(s)
Padre , Predisposición Genética a la Enfermedad/genética , Madres , Polimorfismo de Nucleótido Simple/genética , Alelos , Sitios de Unión , Neoplasias de la Mama/genética , Factor de Unión a CCCTC , Carcinoma Basocelular/genética , Cromosomas Humanos Par 11/genética , Cromosomas Humanos Par 7/genética , Metilación de ADN/genética , Diabetes Mellitus Tipo 2/genética , Femenino , Genoma Humano/genética , Impresión Genómica/genética , Haplotipos , Humanos , Islandia , Masculino , Linaje , Proteínas Represoras/metabolismoRESUMEN
Schizophrenia is a complex disorder, caused by both genetic and environmental factors and their interactions. Research on pathogenesis has traditionally focused on neurotransmitter systems in the brain, particularly those involving dopamine. Schizophrenia has been considered a separate disease for over a century, but in the absence of clear biological markers, diagnosis has historically been based on signs and symptoms. A fundamental message emerging from genome-wide association studies of copy number variations (CNVs) associated with the disease is that its genetic basis does not necessarily conform to classical nosological disease boundaries. Certain CNVs confer not only high relative risk of schizophrenia but also of other psychiatric disorders. The structural variations associated with schizophrenia can involve several genes and the phenotypic syndromes, or the 'genomic disorders', have not yet been characterized. Single nucleotide polymorphism (SNP)-based genome-wide association studies with the potential to implicate individual genes in complex diseases may reveal underlying biological pathways. Here we combined SNP data from several large genome-wide scans and followed up the most significant association signals. We found significant association with several markers spanning the major histocompatibility complex (MHC) region on chromosome 6p21.3-22.1, a marker located upstream of the neurogranin gene (NRGN) on 11q24.2 and a marker in intron four of transcription factor 4 (TCF4) on 18q21.2. Our findings implicating the MHC region are consistent with an immune component to schizophrenia risk, whereas the association with NRGN and TCF4 points to perturbation of pathways involved in brain development, memory and cognition.
Asunto(s)
Predisposición Genética a la Enfermedad/genética , Polimorfismo de Nucleótido Simple/genética , Esquizofrenia/genética , Factores de Transcripción Básicos con Cremalleras de Leucinas y Motivos Hélice-Asa-Hélice , Cromosomas Humanos Par 11/genética , Cromosomas Humanos Par 18/genética , Cromosomas Humanos Par 6/genética , Proteínas de Unión al ADN/genética , Marcadores Genéticos/genética , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Complejo Mayor de Histocompatibilidad/genética , Neurogranina/genética , Esquizofrenia/inmunología , Factor de Transcripción 4 , Factores de Transcripción/genéticaRESUMEN
Detailed linkage and recombination rate maps are necessary to use the full potential of genome sequencing and population genomic analyses. We used a custom collared flycatcher 50 K SNP array to develop a high-density linkage map with 37 262 markers assigned to 34 linkage groups in 33 autosomes and the Z chromosome. The best-order map contained 4215 markers, with a total distance of 3132 cM and a mean genetic distance between markers of 0.12 cM. Facilitated by the array being designed to include markers from most scaffolds, we obtained a second-generation assembly of the flycatcher genome that approaches full chromosome sequences (N50 super-scaffold size 20.2 Mb and with 1.042 Gb (of 1.116 Gb) anchored to and mostly ordered and oriented along chromosomes). We found that flycatcher and zebra finch chromosomes are entirely syntenic but that inversions at mean rates of 1.5-2.0 event (6.6-7.5 Mb) per My have changed the organization within chromosomes, rates high enough for inversions to potentially have been involved with many speciation events during avian evolution. The mean recombination rate was 3.1 cM/Mb and correlated closely with chromosome size, from 2 cM/Mb for chromosomes >100 Mb to >10 cM/Mb for chromosomes <10 Mb. This size dependence seemed entirely due to an obligate recombination event per chromosome; if 50 cM was subtracted from the genetic lengths of chromosomes, the rate per physical unit DNA was constant across chromosomes. Flycatcher recombination rate showed similar variation along chromosomes as chicken but lacked the large interior recombination deserts characteristic of zebra finch chromosomes.
Asunto(s)
Evolución Molecular , Ligamiento Genético , Recombinación Genética , Pájaros Cantores/genética , Animales , Pollos , Mapeo Cromosómico , Femenino , Pinzones , Genoma , Técnicas de Genotipaje , Masculino , Datos de Secuencia Molecular , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , SinteníaRESUMEN
Microsatellites are polymorphic tracts of short tandem repeats with one to six base-pair (bp) motifs and are some of the most polymorphic variants in the genome. Using 6084 Icelandic parent-offspring trios we estimate 63.7 (95% CI: 61.9-65.4) microsatellite de novo mutations (mDNMs) per offspring per generation, excluding one bp repeats motifs (homopolymers) the estimate is 48.2 mDNMs (95% CI: 46.7-49.6). Paternal mDNMs occur at longer repeats than maternal ones, which are in turn larger with a mean size of 3.4 bp vs 3.1 bp for paternal ones. mDNMs increase by 0.97 (95% CI: 0.90-1.04) and 0.31 (95% CI: 0.25-0.37) per year of father's and mother's age at conception, respectively. Here, we find two independent coding variants that associate with the number of mDNMs transmitted to offspring; The minor allele of a missense variant (allele frequency (AF) = 1.9%) in MSH2, a mismatch repair gene, increases transmitted mDNMs from both parents (effect: 13.1 paternal and 7.8 maternal mDNMs). A synonymous variant (AF = 20.3%) in NEIL2, a DNA damage repair gene, increases paternally transmitted mDNMs (effect: 4.4 mDNMs). Thus, the microsatellite mutation rate in humans is in part under genetic control.
Asunto(s)
Reparación de la Incompatibilidad de ADN , Mutación de Línea Germinal , Humanos , Alelos , Mutación de Línea Germinal/genética , Repeticiones de Microsatélite/genética , Células GerminativasRESUMEN
Migraine is a complex neurovascular disease with a range of severity and symptoms, yet mostly studied as one phenotype in genome-wide association studies (GWAS). Here we combine large GWAS datasets from six European populations to study the main migraine subtypes, migraine with aura (MA) and migraine without aura (MO). We identified four new MA-associated variants (in PRRT2, PALMD, ABO and LRRK2) and classified 13 MO-associated variants. Rare variants with large effects highlight three genes. A rare frameshift variant in brain-expressed PRRT2 confers large risk of MA and epilepsy, but not MO. A burden test of rare loss-of-function variants in SCN11A, encoding a neuron-expressed sodium channel with a key role in pain sensation, shows strong protection against migraine. Finally, a rare variant with cis-regulatory effects on KCNK5 confers large protection against migraine and brain aneurysms. Our findings offer new insights with therapeutic potential into the complex biology of migraine and its subtypes.
Asunto(s)
Epilepsia , Trastornos Migrañosos , Migraña con Aura , Humanos , Estudio de Asociación del Genoma Completo , Trastornos Migrañosos/genética , Migraña con Aura/genética , FenotipoRESUMEN
Deletions within the neurexin 1 gene (NRXN1; 2p16.3) are associated with autism and have also been reported in two families with schizophrenia. We examined NRXN1, and the closely related NRXN2 and NRXN3 genes, for copy number variants (CNVs) in 2977 schizophrenia patients and 33 746 controls from seven European populations (Iceland, Finland, Norway, Germany, The Netherlands, Italy and UK) using microarray data. We found 66 deletions and 5 duplications in NRXN1, including a de novo deletion: 12 deletions and 2 duplications occurred in schizophrenia cases (0.47%) compared to 49 and 3 (0.15%) in controls. There was no common breakpoint and the CNVs varied from 18 to 420 kb. No CNVs were found in NRXN2 or NRXN3. We performed a Cochran-Mantel-Haenszel exact test to estimate association between all CNVs and schizophrenia (P = 0.13; OR = 1.73; 95% CI 0.81-3.50). Because the penetrance of NRXN1 CNVs may vary according to the level of functional impact on the gene, we next restricted the association analysis to CNVs that disrupt exons (0.24% of cases and 0.015% of controls). These were significantly associated with a high odds ratio (P = 0.0027; OR 8.97, 95% CI 1.8-51.9). We conclude that NRXN1 deletions affecting exons confer risk of schizophrenia.
Asunto(s)
Silenciador del Gen , Proteínas del Tejido Nervioso/genética , Esquizofrenia/genética , Adolescente , Adulto , Proteínas de Unión al Calcio , Estudios de Casos y Controles , Moléculas de Adhesión Celular Neuronal , Exones , Femenino , Eliminación de Gen , Dosificación de Gen , Duplicación de Gen , Predisposición Genética a la Enfermedad , Humanos , Masculino , Moléculas de Adhesión de Célula Nerviosa , Población Blanca/genética , Adulto JovenRESUMEN
Barnacles are key marine crustaceans in several habitats, and they constitute a common practical problem by causing biofouling on man-made marine constructions and ships. Despite causing considerable ecological and economic impacts, there is a surprising void of basic genomic knowledge, and a barnacle reference genome is lacking. We here set out to characterize the genome of the bay barnacle Balanus improvisus (= Amphibalanus improvisus) based on short-read whole-genome sequencing and experimental genome size estimation. We show both experimentally (DNA staining and flow cytometry) and computationally (k-mer analysis) that B. improvisus has a haploid genome size of ~ 740 Mbp. A pilot genome assembly rendered a total assembly size of ~ 600 Mbp and was highly fragmented with an N50 of only 2.2 kbp. Further assembly-based and assembly-free analyses revealed that the very limited assembly contiguity is due to the B. improvisus genome having an extremely high nucleotide diversity (π) in coding regions (average π ≈ 5% and average π in fourfold degenerate sites ≈ 20%), and an overall high repeat content (at least 40%). We also report on high variation in the α-octopamine receptor OctA (average π = 3.6%), which might increase the risk that barnacle populations evolve resistance toward antifouling agents. The genomic features described here can help in planning for a future high-quality reference genome, which is urgently needed to properly explore and understand proteins of interest in barnacle biology and marine biotechnology and for developing better antifouling strategies.
Asunto(s)
Genoma , Thoracica/genética , Animales , Incrustaciones Biológicas , Nucleótidos , Receptores de Amina Biogénica/genéticaRESUMEN
Multiple myeloma (MM) is caused by the uncontrolled, clonal expansion of plasma cells. While there is epidemiological evidence for inherited susceptibility, the molecular basis remains incompletely understood. We report a genome-wide association study totalling 5,320 cases and 422,289 controls from four Nordic populations, and find a novel MM risk variant at SOHLH2 at 13q13.3 (risk allele frequency = 3.5%; odds ratio = 1.38; P = 2.2 × 10-14). This gene encodes a transcription factor involved in gametogenesis that is normally only weakly expressed in plasma cells. The association is represented by 14 variants in linkage disequilibrium. Among these, rs75712673 maps to a genomic region with open chromatin in plasma cells, and upregulates SOHLH2 in this cell type. Moreover, rs75712673 influences transcriptional activity in luciferase assays, and shows a chromatin looping interaction with the SOHLH2 promoter. Our work provides novel insight into MM susceptibility.
Asunto(s)
Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Mieloma Múltiple/genética , Anciano , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Células Germinativas/metabolismo , Mutación de Línea Germinal , Humanos , Desequilibrio de Ligamiento , Masculino , Polimorfismo de Nucleótido SimpleRESUMEN
Bell's palsy is the most common cause of unilateral facial paralysis and is defined as an idiopathic and acute inability to control movements of the facial muscles on the affected side. While the pathogenesis remains unknown, previous studies have implicated post-viral inflammation and resulting compression of the facial nerve. Reported heritability estimates of 4-14% suggest a genetic component in the etiology and an autosomal dominant inheritance has been proposed. Here, we report findings from a meta-analysis of genome-wide association studies uncovering the first unequivocal association with Bell's palsy (rs9357446-A; P = 6.79 × 10-23, OR = 1.23; Ncases = 4714, Ncontrols = 1,011,520). The variant also confers risk of intervertebral disc disorders (P = 2.99 × 10-11, OR = 1.04) suggesting a common pathogenesis in part or a true pleiotropy.
Asunto(s)
Parálisis de Bell/genética , Adulto , Anciano , Músculos Faciales/patología , Nervio Facial/patología , Parálisis Facial/genética , Femenino , Estudio de Asociación del Genoma Completo/métodos , Humanos , Inflamación/genética , Masculino , Persona de Mediana Edad , Movimiento/fisiología , Estudios Prospectivos , RiesgoRESUMEN
The success of genome-wide association studies (GWAS) in identifying common, low-penetrance variant-cancer associations for the past decade is undisputed. However, discovering additional high-penetrance cancer mutations in unknown cancer predisposing genes requires detection of variant-cancer association of ultra-rare coding variants. Consequently, large-scale next-generation sequence data with associated phenotype information are needed. Here, we used genotype data on 166,281 Icelanders, of which, 49,708 were whole-genome sequenced and 408,595 individuals from the UK Biobank, of which, 41,147 were whole-exome sequenced, to test for association between loss-of-function burden in autosomal genes and basal cell carcinoma (BCC), the most common cancer in Caucasians. A total of 25,205 BCC cases and 683,058 controls were tested. Rare germline loss-of-function variants in PTPN14 conferred substantial risks of BCC (OR, 8.0; P = 1.9 × 10-12), with a quarter of carriers getting BCC before age 70 and over half in their lifetime. Furthermore, common variants at the PTPN14 locus were associated with BCC, suggesting PTPN14 as a new, high-impact BCC predisposition gene. A follow-up investigation of 24 cancers and three benign tumor types showed that PTPN14 loss-of-function variants are associated with high risk of cervical cancer (OR, 12.7, P = 1.6 × 10-4) and low age at diagnosis. Our findings, using power-increasing methods with high-quality rare variant genotypes, highlight future prospects for new discoveries on carcinogenesis. SIGNIFICANCE: This study identifies the tumor-suppressor gene PTPN14 as a high-impact BCC predisposition gene and indicates that inactivation of PTPN14 by germline sequence variants may also lead to increased risk of cervical cancer.
Asunto(s)
Carcinoma Basocelular/genética , Mutación con Pérdida de Función , Penetrancia , Proteínas Tirosina Fosfatasas no Receptoras/genética , Neoplasias Cutáneas/genética , Neoplasias del Cuello Uterino/genética , Factores de Edad , Carcinoma Basocelular/epidemiología , Estudios de Casos y Controles , Femenino , Frecuencia de los Genes , Genes Supresores de Tumor , Predisposición Genética a la Enfermedad , Pruebas Genéticas , Estudio de Asociación del Genoma Completo , Genotipo , Mutación de Línea Germinal , Humanos , Islandia/epidemiología , Masculino , Oportunidad Relativa , Neoplasias Cutáneas/epidemiología , Bancos de Tejidos/estadística & datos numéricos , Reino Unido/epidemiología , Neoplasias del Cuello Uterino/epidemiología , Secuenciación del Exoma/estadística & datos numéricos , Secuenciación Completa del Genoma/estadística & datos numéricosRESUMEN
We performed a systematic, large-scale analysis of human protein complexes comprising gene products implicated in many different categories of human disease to create a phenome-interactome network. This was done by integrating quality-controlled interactions of human proteins with a validated, computationally derived phenotype similarity score, permitting identification of previously unknown complexes likely to be associated with disease. Using a phenomic ranking of protein complexes linked to human disease, we developed a Bayesian predictor that in 298 of 669 linkage intervals correctly ranks the known disease-causing protein as the top candidate, and in 870 intervals with no identified disease-causing gene, provides novel candidates implicated in disorders such as retinitis pigmentosa, epithelial ovarian cancer, inflammatory bowel disease, amyotrophic lateral sclerosis, Alzheimer disease, type 2 diabetes and coronary heart disease. Our publicly available draft of protein complexes associated with pathology comprises 506 complexes, which reveal functional relationships between disease-promoting genes that will inform future experimentation.
Asunto(s)
Predisposición Genética a la Enfermedad/genética , Conformación Proteica , Mapeo de Interacción de Proteínas , Proteínas/efectos adversos , Proteoma/genética , Proteómica , Teorema de Bayes , Bases de Datos Genéticas , Bases de Datos de Proteínas , Enfermedades Genéticas Congénitas , Humanos , Mutación , Fenotipo , Proteínas/genéticaRESUMEN
The number of national reference populations that are whole-genome sequenced are rapidly increasing. Partly driving this development is the fact that genetic disease studies benefit from knowing the genetic variation typical for the geographical area of interest. A whole-genome sequenced Swedish national reference population (n = 1000) has been recently published but with few samples from northern Sweden. In the present study we have whole-genome sequenced a control population (n = 300) (ACpop) from Västerbotten County, a sparsely populated region in northern Sweden previously shown to be genetically different from southern Sweden. The aggregated variant frequencies within ACpop are publicly available (DOI 10.17044/NBIS/G000005) to function as a basic resource in clinical genetics and for genetic studies. Our analysis of ACpop, representing approximately 0.11% of the population in Västerbotten, indicates the presence of a genetic substructure within the county. Furthermore, a demographic analysis showed that the population from which samples were drawn was to a large extent geographically stationary, a finding that was corroborated in the genetic analysis down to the level of municipalities. Including ACpop in the reference population when imputing unknown variants in a Västerbotten cohort resulted in a strong increase in the number of high-confidence imputed variants (up to 81% for variants with minor allele frequency < 5%). ACpop was initially designed for cancer disease studies, but the genetic structure within the cohort will be of general interest for all genetic disease studies in northern Sweden.
Asunto(s)
Genoma Humano , Polimorfismo Genético , Población/genética , Anciano , Anciano de 80 o más Años , Bases de Datos Genéticas , Femenino , Humanos , Masculino , Persona de Mediana Edad , Suecia , Secuenciación Completa del GenomaRESUMEN
Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at https://github.com/nf-core/sarek and at https://nf-co.re/sarek/.