Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 95
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Nat Rev Genet ; 23(11): 665-679, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-35581355

RESUMEN

Genome-wide association studies using large-scale genome and exome sequencing data have become increasingly valuable in identifying associations between genetic variants and disease, transforming basic research and translational medicine. However, this progress has not been equally shared across all people and conditions, in part due to limited resources. Leveraging publicly available sequencing data as external common controls, rather than sequencing new controls for every study, can better allocate resources by augmenting control sample sizes or providing controls where none existed. However, common control studies must be carefully planned and executed as even small differences in sample ascertainment and processing can result in substantial bias. Here, we discuss challenges and opportunities for the robust use of common controls in high-throughput sequencing studies, including study design, quality control and statistical approaches. Thoughtful generation and use of large and valuable genetic sequencing data sets will enable investigation of a broader and more representative set of conditions, environments and genetic ancestries than otherwise possible.


Asunto(s)
Exoma , Estudio de Asociación del Genoma Completo , Exoma/genética , Predisposición Genética a la Enfermedad , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Secuenciación del Exoma
2.
Am J Hum Genet ; 109(6): 1055-1064, 2022 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-35588732

RESUMEN

Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p = 3 × 10-14), 62.3% increase in risk for severe obesity (p = 1 × 10-6), and median 5.29 years earlier onset for bariatric surgery (p = 0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p = 2 × 10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.


Asunto(s)
Herencia Multifactorial , Obesidad , Índice de Masa Corporal , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Herencia Multifactorial/genética , Obesidad/genética , Fenotipo , Factores de Riesgo
3.
PLoS Genet ; 18(3): e1010105, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35324888

RESUMEN

We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,500 traits using genetic and phenotype data in the UK Biobank. We report 813 sparse PRS models with significant (p < 2.5 x 10-5) incremental predictive performance when compared against the covariate-only model that considers age, sex, types of genotyping arrays, and the principal component loadings of genotypes. We report a significant correlation between the number of genetic variants selected in the sparse PRS model and the incremental predictive performance (Spearman's ⍴ = 0.61, p = 2.2 x 10-59 for quantitative traits, ⍴ = 0.21, p = 9.6 x 10-4 for binary traits). The sparse PRS model trained on European individuals showed limited transferability when evaluated on non-European individuals in the UK Biobank. We provide the PRS model weights on the Global Biobank Engine (https://biobankengine.stanford.edu/prs).


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Bancos de Muestras Biológicas , Predisposición Genética a la Enfermedad , Humanos , Herencia Multifactorial/genética , Fenotipo , Factores de Riesgo , Reino Unido
4.
Am J Hum Genet ; 108(8): 1401-1408, 2021 08 05.
Artículo en Inglés | MEDLINE | ID: mdl-34216550

RESUMEN

Precise interpretation of the effects of rare protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA sequencing of the Genotype Tissue Expression v.8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency ≤ 1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, the inclusion of ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.


Asunto(s)
Codón sin Sentido/genética , Regulación de la Expresión Génica , Enfermedades Genéticas Congénitas/patología , Variación Genética , Mutación , Degradación de ARNm Mediada por Codón sin Sentido , ARN Mensajero/genética , Frecuencia de los Genes , Enfermedades Genéticas Congénitas/genética , Humanos
5.
Am J Hum Genet ; 108(12): 2354-2367, 2021 12 02.
Artículo en Inglés | MEDLINE | ID: mdl-34822764

RESUMEN

Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.


Asunto(s)
Variación Genética , Estudio de Asociación del Genoma Completo , Modelos Genéticos , Teorema de Bayes , Femenino , Humanos , Masculino , Fenotipo
7.
Am J Hum Genet ; 106(5): 611-622, 2020 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-32275883

RESUMEN

Population-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertaining disease cases from heterogeneous data sources such as hospital records, digital questionnaire responses, or interviews. In this study, we use genetic parameters, including genetic correlation, to evaluate whether GWAS performed using cases in the UK Biobank ascertained from hospital records, questionnaire responses, and family history of disease implicate similar disease genetics across a range of effect sizes. We find that hospital record and questionnaire GWAS largely identify similar genetic effects for many complex phenotypes and that combining together both phenotyping methods improves power to detect genetic associations. We also show that family history GWAS using cases ascertained on family history of disease agrees with combined hospital record and questionnaire GWAS and that family history GWAS has better power to detect genetic associations for some phenotypes. Overall, this work demonstrates that digital phenotyping and unstructured phenotype data can be combined with structured data such as hospital records to identify cases for GWAS in biobanks and improve the ability of such studies to identify genetic associations.


Asunto(s)
Enfermedad/genética , Estudio de Asociación del Genoma Completo , Fenotipo , Asma/genética , Bases de Datos Factuales , Femenino , Genética Médica , Genotipo , Humanos , Masculino , Neoplasias/genética , Reino Unido
8.
Biostatistics ; 23(2): 522-540, 2022 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-32989444

RESUMEN

We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the $L^1$-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in Qian and others (2019). Our algorithm is particularly suitable for large-scale and high-dimensional data that do not fit in the memory. The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow and others, 2015). We provide a publicly available implementation of the proposed approach for genetics data on top of the PLINK2 package and name it snpnet-Cox.


Asunto(s)
Algoritmos , Bancos de Muestras Biológicas , Humanos , Funciones de Verosimilitud , Modelos de Riesgos Proporcionales , Reino Unido
9.
Nature ; 550(7675): 244-248, 2017 10 11.
Artículo en Inglés | MEDLINE | ID: mdl-29022598

RESUMEN

X chromosome inactivation (XCI) silences transcription from one of the two X chromosomes in female mammalian cells to balance expression dosage between XX females and XY males. XCI is, however, incomplete in humans: up to one-third of X-chromosomal genes are expressed from both the active and inactive X chromosomes (Xa and Xi, respectively) in female cells, with the degree of 'escape' from inactivation varying between genes and individuals. The extent to which XCI is shared between cells and tissues remains poorly characterized, as does the degree to which incomplete XCI manifests as detectable sex differences in gene expression and phenotypic traits. Here we describe a systematic survey of XCI, integrating over 5,500 transcriptomes from 449 individuals spanning 29 tissues from GTEx (v6p release) and 940 single-cell transcriptomes, combined with genomic sequence data. We show that XCI at 683 X-chromosomal genes is generally uniform across human tissues, but identify examples of heterogeneity between tissues, individuals and cells. We show that incomplete XCI affects at least 23% of X-chromosomal genes, identify seven genes that escape XCI with support from multiple lines of evidence and demonstrate that escape from XCI results in sex biases in gene expression, establishing incomplete XCI as a mechanism that is likely to introduce phenotypic diversity. Overall, this updated catalogue of XCI across human tissues helps to increase our understanding of the extent and impact of the incompleteness in the maintenance of XCI.


Asunto(s)
Especificidad de Órganos/genética , Análisis de la Célula Individual , Inactivación del Cromosoma X/genética , Cromosomas Humanos X/genética , Femenino , Genes Ligados a X/genética , Genoma Humano/genética , Genómica , Humanos , Masculino , Fenotipo , Análisis de Secuencia de ARN , Transcriptoma/genética
10.
Nature ; 542(7640): 186-190, 2017 02 09.
Artículo en Inglés | MEDLINE | ID: mdl-28146470

RESUMEN

Height is a highly heritable, classic polygenic trait with approximately 700 common associated variants identified through genome-wide association studies so far. Here, we report 83 height-associated coding variants with lower minor-allele frequencies (in the range of 0.1-4.8%) and effects of up to 2 centimetres per allele (such as those in IHH, STC2, AR and CRISPLD2), greater than ten times the average effect of common variants. In functional follow-up studies, rare height-increasing alleles of STC2 (giving an increase of 1-2 centimetres per allele) compromised proteolytic inhibition of PAPP-A and increased cleavage of IGFBP-4 in vitro, resulting in higher bioavailability of insulin-like growth factors. These 83 height-associated variants overlap genes that are mutated in monogenic growth disorders and highlight new biological candidates (such as ADAMTS3, IL11RA and NOX4) and pathways (such as proteoglycan and glycosaminoglycan synthesis) involved in growth. Our results demonstrate that sufficiently large sample sizes can uncover rare and low-frequency variants of moderate-to-large effect associated with polygenic human phenotypes, and that these variants implicate relevant genes and pathways.


Asunto(s)
Estatura/genética , Frecuencia de los Genes/genética , Variación Genética/genética , Proteínas ADAMTS/genética , Adulto , Alelos , Moléculas de Adhesión Celular/genética , Femenino , Genoma Humano/genética , Glicoproteínas/genética , Glicoproteínas/metabolismo , Glicosaminoglicanos/biosíntesis , Proteínas Hedgehog/genética , Humanos , Péptidos y Proteínas de Señalización Intercelular/genética , Péptidos y Proteínas de Señalización Intercelular/metabolismo , Factores Reguladores del Interferón/genética , Subunidad alfa del Receptor de Interleucina-11/genética , Masculino , Herencia Multifactorial/genética , NADPH Oxidasa 4 , NADPH Oxidasas/genética , Fenotipo , Proteína Plasmática A Asociada al Embarazo/metabolismo , Procolágeno N-Endopeptidasa/genética , Proteoglicanos/biosíntesis , Proteolisis , Receptores Androgénicos/genética , Somatomedinas/metabolismo
11.
PLoS Genet ; 16(10): e1009141, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-33095761

RESUMEN

The UK Biobank is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities for researchers to investigate the relationship between genotypic information and phenotypes of interest. Multiple regression methods, compared with genome-wide association studies (GWAS), have already been showed to greatly improve the prediction performance for a variety of phenotypes. In the high-dimensional settings, the lasso, since its first proposal in statistics, has been proved to be an effective method for simultaneous variable selection and estimation. However, the large-scale and ultrahigh dimension seen in the UK Biobank pose new challenges for applying the lasso method, as many existing algorithms and their implementations are not scalable to large applications. In this paper, we propose a computational framework called batch screening iterative lasso (BASIL) that can take advantage of any existing lasso solver and easily build a scalable solution for very large data, including those that are larger than the memory size. We introduce snpnet, an R package that implements the proposed algorithm on top of glmnet and optimizes for single nucleotide polymorphism (SNP) datasets. It currently supports ℓ1-penalized linear model, logistic regression, Cox model, and also extends to the elastic net with ℓ1/ℓ2 penalty. We demonstrate results on the UK Biobank dataset, where we achieve competitive predictive performance for all four phenotypes considered (height, body mass index, asthma, high cholesterol) using only a small fraction of the variants compared with other established polygenic risk score methods.


Asunto(s)
Asma/epidemiología , Bancos de Muestras Biológicas , Genética de Población , Estudio de Asociación del Genoma Completo , Algoritmos , Asma/sangre , Asma/genética , Estatura/genética , Índice de Masa Corporal , Colesterol/sangre , Estudios de Cohortes , Genotipo , Humanos , Modelos Logísticos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Modelos de Riesgos Proporcionales , Reino Unido/epidemiología
12.
PLoS Genet ; 16(11): e1008802, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33226994

RESUMEN

The clinical evaluation of a genetic syndrome relies upon recognition of a characteristic pattern of signs or symptoms to guide targeted genetic testing for confirmation of the diagnosis. However, individuals displaying a single phenotype of a complex syndrome may not meet criteria for clinical diagnosis or genetic testing. Here, we present a phenome-wide association study (PheWAS) approach to systematically explore the phenotypic expressivity of common and rare alleles in genes associated with four well-described syndromic diseases (Alagille (AS), Marfan (MS), DiGeorge (DS), and Noonan (NS) syndromes) in the general population. Using human phenotype ontology (HPO) terms, we systematically mapped 60 phenotypes related to AS, MS, DS and NS in 337,198 unrelated white British from the UK Biobank (UKBB) based on their hospital admission records, self-administrated questionnaires, and physiological measurements. We performed logistic regression adjusting for age, sex, and the first 5 genetic principal components, for each phenotype and each variant in the target genes (JAG1, NOTCH2 FBN1, PTPN1 and RAS-opathy genes, and genes in the 22q11.2 locus) and performed a gene burden test. Overall, we observed multiple phenotype-genotype correlations, such as the association between variation in JAG1, FBN1, PTPN11 and SOS2 with diastolic and systolic blood pressure; and pleiotropy among multiple variants in syndromic genes. For example, rs11066309 in PTPN11 was significantly associated with a lower body mass index, an increased risk of hypothyroidism and a smaller size for gestational age, all in concordance with NS-related phenotypes. Similarly, rs589668 in FBN1 was associated with an increase in body height and blood pressure, and a reduced body fat percentage as observed in Marfan syndrome. Our findings suggest that the spectrum of associations of common and rare variants in genes involved in syndromic diseases can be extended to individual phenotypes within the general population.


Asunto(s)
Variación Biológica Poblacional/genética , Estudios de Asociación Genética/métodos , Estudio de Asociación del Genoma Completo/métodos , Síndrome de Alagille/genética , Alelos , Síndrome de DiGeorge/genética , Femenino , Frecuencia de los Genes/genética , Predisposición Genética a la Enfermedad/genética , Pruebas Genéticas/métodos , Variación Genética/genética , Humanos , Masculino , Síndrome de Marfan/genética , Síndrome de Noonan/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Reino Unido , Población Blanca/genética
13.
PLoS Genet ; 16(5): e1008682, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32369491

RESUMEN

Protein-altering variants that are protective against human disease provide in vivo validation of therapeutic targets. Here we use genotyping data from UK Biobank (n = 337,151 unrelated White British individuals) and FinnGen (n = 176,899) to conduct a search for protein-altering variants conferring lower intraocular pressure (IOP) and protection against glaucoma. Through rare protein-altering variant association analysis, we find a missense variant in ANGPTL7 in UK Biobank (rs28991009, p.Gln175His, MAF = 0.8%, genotyped in 82,253 individuals with measured IOP and an independent set of 4,238 glaucoma patients and 250,660 controls) that significantly lowers IOP (ß = -0.53 and -0.67 mmHg for heterozygotes, -3.40 and -2.37 mmHg for homozygotes, P = 5.96 x 10-9 and 1.07 x 10-13 for corneal compensated and Goldman-correlated IOP, respectively) and is associated with 34% reduced risk of glaucoma (P = 0.0062). In FinnGen, we identify an ANGPTL7 missense variant at a greater than 50-fold increased frequency in Finland compared with other populations (rs147660927, p.Arg220Cys, MAF Finland = 4.3%), which was genotyped in 6,537 glaucoma patients and 170,362 controls and is associated with a 29% lower glaucoma risk (P = 1.9 x 10-12 for all glaucoma types and also protection against its subtypes including exfoliation, primary open-angle, and primary angle-closure). We further find three rarer variants in UK Biobank, including a protein-truncating variant, which confer a strong composite lowering of IOP (P = 0.0012 and 0.24 for Goldman-correlated and corneal compensated IOP, respectively), suggesting the protective mechanism likely resides in the loss of interaction or function. Our results support inhibition or down-regulation of ANGPTL7 as a therapeutic strategy for glaucoma.


Asunto(s)
Proteínas Similares a la Angiopoyetina/genética , Glaucoma/genética , Glaucoma/prevención & control , Presión Intraocular/genética , Polimorfismo de Nucleótido Simple , Adulto , Anciano , Anciano de 80 o más Años , Proteína 7 Similar a la Angiopoyetina , Bancos de Muestras Biológicas/estadística & datos numéricos , Estudios de Casos y Controles , Estudios de Cohortes , Femenino , Finlandia/epidemiología , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Genética de Población , Estudio de Asociación del Genoma Completo , Glaucoma/epidemiología , Humanos , Mutación con Pérdida de Función/genética , Masculino , Persona de Mediana Edad , Mutación Missense , Reino Unido/epidemiología
14.
Am J Hum Genet ; 105(2): 373-383, 2019 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-31353025

RESUMEN

Copy-number variations (CNVs) represent a significant proportion of the genetic differences between individuals and many CNVs associate causally with syndromic disease and clinical outcomes. Here, we characterize the landscape of copy-number variation and their phenome-wide effects in a sample of 472,228 array-genotyped individuals from the UK Biobank. In addition to population-level selection effects against genic loci conferring high mortality, we describe genetic burden from potentially pathogenic and previously uncharacterized CNV loci across more than 3,000 quantitative and dichotomous traits, with separate analyses for common and rare classes of variation. Specifically, we highlight the effects of CNVs at two well-known syndromic loci 16p11.2 and 22q11.2, previously uncharacterized variation at 9p23, and several genic associations in the context of acute coronary artery disease and high body mass index. Our data constitute a deeply contextualized portrait of population-wide burden of copy-number variation, as well as a series of dosage-mediated genic associations across the medical phenome.


Asunto(s)
Trastorno Autístico/genética , Trastornos de los Cromosomas/genética , Cromosomas Humanos Par 9/genética , Enfermedad de la Arteria Coronaria/genética , Variaciones en el Número de Copia de ADN , Síndrome de DiGeorge/genética , Discapacidad Intelectual/genética , Fenómica , Polimorfismo de Nucleótido Simple , Bancos de Muestras Biológicas , Estudios de Casos y Controles , Deleción Cromosómica , Cromosomas Humanos Par 16/genética , Femenino , Sitios Genéticos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino , Fenotipo , Reino Unido
15.
Bioinformatics ; 37(22): 4148-4155, 2021 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-34146108

RESUMEN

MOTIVATION: Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. RESULTS: We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory. AVAILABILITY AND IMPLEMENTATION: https://github.com/rivas-lab/snpnet/tree/compact.


Asunto(s)
Bancos de Muestras Biológicas , Genoma , Humanos , Algoritmos , Mapeo Cromosómico , Análisis de los Mínimos Cuadrados
16.
Bioinformatics ; 37(23): 4437-4443, 2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-33560296

RESUMEN

MOTIVATION: The prediction performance of Cox proportional hazard model suffers when there are only few uncensored events in the training data. RESULTS: We propose a Sparse-Group regularized Cox regression method to improve the prediction performance of large-scale and high-dimensional survival data with few observed events. Our approach is applicable when there is one or more other survival responses that 1. has a large number of observed events; 2. share a common set of associated predictors with the rare event response. This scenario is common in the UK Biobank dataset where records for a large number of common and less prevalent diseases of the same set of individuals are available. By analyzing these responses together, we hope to achieve higher prediction performance than when they are analyzed individually. To make this approach practical for large-scale data, we developed an accelerated proximal gradient optimization algorithm as well as a screening procedure inspired by Qian et al. AVAILABILITYANDIMPLEMENTATION: https://github.com/rivas-lab/multisnpnet-Cox.


Asunto(s)
Algoritmos , Humanos , Análisis de Supervivencia , Modelos de Riesgos Proporcionales , Análisis de Regresión
17.
Nature ; 536(7614): 41-47, 2016 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-27398621

RESUMEN

The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.


Asunto(s)
Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Alelos , Análisis Mutacional de ADN , Europa (Continente)/etnología , Exoma , Estudio de Asociación del Genoma Completo , Técnicas de Genotipaje , Humanos , Tamaño de la Muestra
18.
Nature ; 536(7616): 285-91, 2016 08 18.
Artículo en Inglés | MEDLINE | ID: mdl-27535533

RESUMEN

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.


Asunto(s)
Exoma/genética , Variación Genética/genética , Análisis Mutacional de ADN , Conjuntos de Datos como Asunto , Humanos , Fenotipo , Proteoma/genética , Enfermedades Raras/genética , Tamaño de la Muestra
20.
Gut ; 70(2): 285-296, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-32651235

RESUMEN

OBJECTIVE: Both the gut microbiome and host genetics are known to play significant roles in the pathogenesis of IBD. However, the interaction between these two factors and its implications in the aetiology of IBD remain underexplored. Here, we report on the influence of host genetics on the gut microbiome in IBD. DESIGN: To evaluate the impact of host genetics on the gut microbiota of patients with IBD, we combined whole exome sequencing of the host genome and whole genome shotgun sequencing of 1464 faecal samples from 525 patients with IBD and 939 population-based controls. We followed a four-step analysis: (1) exome-wide microbial quantitative trait loci (mbQTL) analyses, (2) a targeted approach focusing on IBD-associated genomic regions and protein truncating variants (PTVs, minor allele frequency (MAF) >5%), (3) gene-based burden tests on PTVs with MAF <5% and exome copy number variations (CNVs) with site frequency <1%, (4) joint analysis of both cohorts to identify the interactions between disease and host genetics. RESULTS: We identified 12 mbQTLs, including variants in the IBD-associated genes IL17REL, MYRF, SEC16A and WDR78. For example, the decrease of the pathway acetyl-coenzyme A biosynthesis, which is involved in short chain fatty acids production, was associated with variants in the gene MYRF (false discovery rate <0.05). Changes in functional pathways involved in the metabolic potential were also observed in participants carrying rare PTVs or CNVs in CYP2D6, GPR151 and CD160 genes. These genes are known for their function in the immune system. Moreover, interaction analyses confirmed previously known IBD disease-specific mbQTLs in TNFSF15. CONCLUSION: This study highlights that both common and rare genetic variants affecting the immune system are key factors in shaping the gut microbiota in the context of IBD and pinpoints towards potential mechanisms for disease treatment.


Asunto(s)
Secuenciación del Exoma , Microbioma Gastrointestinal/genética , Predisposición Genética a la Enfermedad/genética , Enfermedades Inflamatorias del Intestino/etiología , Proteínas Adaptadoras Transductoras de Señales/genética , Adulto , Estudios de Casos y Controles , Variaciones en el Número de Copia de ADN/genética , Femenino , Frecuencia de los Genes/genética , Humanos , Enfermedades Inflamatorias del Intestino/genética , Enfermedades Inflamatorias del Intestino/microbiología , Masculino , Proteínas de la Membrana/genética , Metagenómica , Persona de Mediana Edad , Sitios de Carácter Cuantitativo/genética , Receptores de Interleucina-17/genética , Factores de Transcripción/genética , Proteínas de Transporte Vesicular/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA