Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 186(20): 4386-4403.e29, 2023 09 28.
Artículo en Inglés | MEDLINE | ID: mdl-37774678

RESUMEN

Altered microglial states affect neuroinflammation, neurodegeneration, and disease but remain poorly understood. Here, we report 194,000 single-nucleus microglial transcriptomes and epigenomes across 443 human subjects and diverse Alzheimer's disease (AD) pathological phenotypes. We annotate 12 microglial transcriptional states, including AD-dysregulated homeostatic, inflammatory, and lipid-processing states. We identify 1,542 AD-differentially-expressed genes, including both microglia-state-specific and disease-stage-specific alterations. By integrating epigenomic, transcriptomic, and motif information, we infer upstream regulators of microglial cell states, gene-regulatory networks, enhancer-gene links, and transcription-factor-driven microglial state transitions. We demonstrate that ectopic expression of our predicted homeostatic-state activators induces homeostatic features in human iPSC-derived microglia-like cells, while inhibiting activators of inflammation can block inflammatory progression. Lastly, we pinpoint the expression of AD-risk genes in microglial states and differential expression of AD-risk genes and their regulators during AD progression. Overall, we provide insights underlying microglial states, including state-specific and AD-stage-specific microglial alterations at unprecedented resolution.


Asunto(s)
Enfermedad de Alzheimer , Microglía , Humanos , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/patología , Regulación de la Expresión Génica , Inflamación/patología , Microglía/metabolismo , Factores de Transcripción/metabolismo , Transcriptoma , Epigenoma
2.
Am J Hum Genet ; 110(11): 1888-1902, 2023 11 02.
Artículo en Inglés | MEDLINE | ID: mdl-37890495

RESUMEN

Admixed individuals offer unique opportunities for addressing limited transferability in polygenic scores (PGSs), given the substantial trans-ancestry genetic correlation in many complex traits. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data and is thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to n = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold improvements for some traits (neutrophil count, R2 = 0.058) over the baseline model trained on the same number of European individuals. When we allowed iPGS to use n = 284,661 individuals, we observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for the other individuals. We further developed iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations were present. For neutrophil count, for example, iPGS+refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group (R2 = 0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of including diverse individuals for developing more equitable PGS models.


Asunto(s)
Herencia Multifactorial , Población Blanca , Humanos , Herencia Multifactorial/genética , Población Blanca/genética , Fenotipo , Población Negra/genética , Pueblo Asiatico/genética , Estudio de Asociación del Genoma Completo/métodos
3.
Am J Hum Genet ; 109(6): 1055-1064, 2022 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-35588732

RESUMEN

Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p = 3 × 10-14), 62.3% increase in risk for severe obesity (p = 1 × 10-6), and median 5.29 years earlier onset for bariatric surgery (p = 0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p = 2 × 10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.


Asunto(s)
Herencia Multifactorial , Obesidad , Índice de Masa Corporal , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Herencia Multifactorial/genética , Obesidad/genética , Fenotipo , Factores de Riesgo
4.
PLoS Genet ; 18(3): e1010105, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35324888

RESUMEN

We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,500 traits using genetic and phenotype data in the UK Biobank. We report 813 sparse PRS models with significant (p < 2.5 x 10-5) incremental predictive performance when compared against the covariate-only model that considers age, sex, types of genotyping arrays, and the principal component loadings of genotypes. We report a significant correlation between the number of genetic variants selected in the sparse PRS model and the incremental predictive performance (Spearman's ⍴ = 0.61, p = 2.2 x 10-59 for quantitative traits, ⍴ = 0.21, p = 9.6 x 10-4 for binary traits). The sparse PRS model trained on European individuals showed limited transferability when evaluated on non-European individuals in the UK Biobank. We provide the PRS model weights on the Global Biobank Engine (https://biobankengine.stanford.edu/prs).


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Bancos de Muestras Biológicas , Predisposición Genética a la Enfermedad , Humanos , Herencia Multifactorial/genética , Fenotipo , Factores de Riesgo , Reino Unido
5.
Am J Hum Genet ; 108(12): 2354-2367, 2021 12 02.
Artículo en Inglés | MEDLINE | ID: mdl-34822764

RESUMEN

Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.


Asunto(s)
Variación Genética , Estudio de Asociación del Genoma Completo , Modelos Genéticos , Teorema de Bayes , Femenino , Humanos , Masculino , Fenotipo
6.
Am J Hum Genet ; 106(5): 611-622, 2020 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-32275883

RESUMEN

Population-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertaining disease cases from heterogeneous data sources such as hospital records, digital questionnaire responses, or interviews. In this study, we use genetic parameters, including genetic correlation, to evaluate whether GWAS performed using cases in the UK Biobank ascertained from hospital records, questionnaire responses, and family history of disease implicate similar disease genetics across a range of effect sizes. We find that hospital record and questionnaire GWAS largely identify similar genetic effects for many complex phenotypes and that combining together both phenotyping methods improves power to detect genetic associations. We also show that family history GWAS using cases ascertained on family history of disease agrees with combined hospital record and questionnaire GWAS and that family history GWAS has better power to detect genetic associations for some phenotypes. Overall, this work demonstrates that digital phenotyping and unstructured phenotype data can be combined with structured data such as hospital records to identify cases for GWAS in biobanks and improve the ability of such studies to identify genetic associations.


Asunto(s)
Enfermedad/genética , Estudio de Asociación del Genoma Completo , Fenotipo , Asma/genética , Bases de Datos Factuales , Femenino , Genética Médica , Genotipo , Humanos , Masculino , Neoplasias/genética , Reino Unido
7.
Biostatistics ; 23(2): 522-540, 2022 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-32989444

RESUMEN

We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the $L^1$-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in Qian and others (2019). Our algorithm is particularly suitable for large-scale and high-dimensional data that do not fit in the memory. The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow and others, 2015). We provide a publicly available implementation of the proposed approach for genetics data on top of the PLINK2 package and name it snpnet-Cox.


Asunto(s)
Algoritmos , Bancos de Muestras Biológicas , Humanos , Funciones de Verosimilitud , Modelos de Riesgos Proporcionales , Reino Unido
8.
PLoS Comput Biol ; 18(8): e1010378, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-36040971

RESUMEN

We present WhichTF, a computational method to identify functionally important transcription factors (TFs) from chromatin accessibility measurements. To rank TFs, WhichTF applies an ontology-guided functional approach to compute novel enrichment by integrating accessibility measurements, high-confidence pre-computed conservation-aware TF binding sites, and putative gene-regulatory models. Comparison with prior sheer abundance-based methods reveals the unique ability of WhichTF to identify context-specific TFs with functional relevance, including NF-κB family members in lymphocytes and GATA factors in cardiac cells. To distinguish the transcriptional regulatory landscape in closely related samples, we apply differential analysis and demonstrate its utility in lymphocyte, mesoderm developmental, and disease cells. We find suggestive, under-characterized TFs, such as RUNX3 in mesoderm development and GLI1 in systemic lupus erythematosus. We also find TFs known for stress response, suggesting routine experimental caveats that warrant careful consideration. WhichTF yields biological insight into known and novel molecular mechanisms of TF-mediated transcriptional regulation in diverse contexts, including human and mouse cell types, cell fate trajectories, and disease-associated cells.


Asunto(s)
Cromatina , Factores de Transcripción , Animales , Sitios de Unión , Cromatina/genética , Regulación de la Expresión Génica , Humanos , Ratones , Unión Proteica , Factores de Transcripción/metabolismo
9.
PLoS Genet ; 16(10): e1009141, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-33095761

RESUMEN

The UK Biobank is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities for researchers to investigate the relationship between genotypic information and phenotypes of interest. Multiple regression methods, compared with genome-wide association studies (GWAS), have already been showed to greatly improve the prediction performance for a variety of phenotypes. In the high-dimensional settings, the lasso, since its first proposal in statistics, has been proved to be an effective method for simultaneous variable selection and estimation. However, the large-scale and ultrahigh dimension seen in the UK Biobank pose new challenges for applying the lasso method, as many existing algorithms and their implementations are not scalable to large applications. In this paper, we propose a computational framework called batch screening iterative lasso (BASIL) that can take advantage of any existing lasso solver and easily build a scalable solution for very large data, including those that are larger than the memory size. We introduce snpnet, an R package that implements the proposed algorithm on top of glmnet and optimizes for single nucleotide polymorphism (SNP) datasets. It currently supports ℓ1-penalized linear model, logistic regression, Cox model, and also extends to the elastic net with ℓ1/ℓ2 penalty. We demonstrate results on the UK Biobank dataset, where we achieve competitive predictive performance for all four phenotypes considered (height, body mass index, asthma, high cholesterol) using only a small fraction of the variants compared with other established polygenic risk score methods.


Asunto(s)
Asma/epidemiología , Bancos de Muestras Biológicas , Genética de Población , Estudio de Asociación del Genoma Completo , Algoritmos , Asma/sangre , Asma/genética , Estatura/genética , Índice de Masa Corporal , Colesterol/sangre , Estudios de Cohortes , Genotipo , Humanos , Modelos Logísticos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Modelos de Riesgos Proporcionales , Reino Unido/epidemiología
10.
PLoS Genet ; 16(5): e1008682, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32369491

RESUMEN

Protein-altering variants that are protective against human disease provide in vivo validation of therapeutic targets. Here we use genotyping data from UK Biobank (n = 337,151 unrelated White British individuals) and FinnGen (n = 176,899) to conduct a search for protein-altering variants conferring lower intraocular pressure (IOP) and protection against glaucoma. Through rare protein-altering variant association analysis, we find a missense variant in ANGPTL7 in UK Biobank (rs28991009, p.Gln175His, MAF = 0.8%, genotyped in 82,253 individuals with measured IOP and an independent set of 4,238 glaucoma patients and 250,660 controls) that significantly lowers IOP (ß = -0.53 and -0.67 mmHg for heterozygotes, -3.40 and -2.37 mmHg for homozygotes, P = 5.96 x 10-9 and 1.07 x 10-13 for corneal compensated and Goldman-correlated IOP, respectively) and is associated with 34% reduced risk of glaucoma (P = 0.0062). In FinnGen, we identify an ANGPTL7 missense variant at a greater than 50-fold increased frequency in Finland compared with other populations (rs147660927, p.Arg220Cys, MAF Finland = 4.3%), which was genotyped in 6,537 glaucoma patients and 170,362 controls and is associated with a 29% lower glaucoma risk (P = 1.9 x 10-12 for all glaucoma types and also protection against its subtypes including exfoliation, primary open-angle, and primary angle-closure). We further find three rarer variants in UK Biobank, including a protein-truncating variant, which confer a strong composite lowering of IOP (P = 0.0012 and 0.24 for Goldman-correlated and corneal compensated IOP, respectively), suggesting the protective mechanism likely resides in the loss of interaction or function. Our results support inhibition or down-regulation of ANGPTL7 as a therapeutic strategy for glaucoma.


Asunto(s)
Proteínas Similares a la Angiopoyetina/genética , Glaucoma/genética , Glaucoma/prevención & control , Presión Intraocular/genética , Polimorfismo de Nucleótido Simple , Adulto , Anciano , Anciano de 80 o más Años , Proteína 7 Similar a la Angiopoyetina , Bancos de Muestras Biológicas/estadística & datos numéricos , Estudios de Casos y Controles , Estudios de Cohortes , Femenino , Finlandia/epidemiología , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Genética de Población , Estudio de Asociación del Genoma Completo , Glaucoma/epidemiología , Humanos , Mutación con Pérdida de Función/genética , Masculino , Persona de Mediana Edad , Mutación Missense , Reino Unido/epidemiología
11.
Bioinformatics ; 37(22): 4148-4155, 2021 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-34146108

RESUMEN

MOTIVATION: Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. RESULTS: We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory. AVAILABILITY AND IMPLEMENTATION: https://github.com/rivas-lab/snpnet/tree/compact.


Asunto(s)
Bancos de Muestras Biológicas , Genoma , Humanos , Algoritmos , Mapeo Cromosómico , Análisis de los Mínimos Cuadrados
12.
Bioinformatics ; 37(23): 4437-4443, 2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-33560296

RESUMEN

MOTIVATION: The prediction performance of Cox proportional hazard model suffers when there are only few uncensored events in the training data. RESULTS: We propose a Sparse-Group regularized Cox regression method to improve the prediction performance of large-scale and high-dimensional survival data with few observed events. Our approach is applicable when there is one or more other survival responses that 1. has a large number of observed events; 2. share a common set of associated predictors with the rare event response. This scenario is common in the UK Biobank dataset where records for a large number of common and less prevalent diseases of the same set of individuals are available. By analyzing these responses together, we hope to achieve higher prediction performance than when they are analyzed individually. To make this approach practical for large-scale data, we developed an accelerated proximal gradient optimization algorithm as well as a screening procedure inspired by Qian et al. AVAILABILITYANDIMPLEMENTATION: https://github.com/rivas-lab/multisnpnet-Cox.


Asunto(s)
Algoritmos , Humanos , Análisis de Supervivencia , Modelos de Riesgos Proporcionales , Análisis de Regresión
13.
Mol Psychiatry ; 25(10): 2422-2430, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-30610202

RESUMEN

Suicide accounts for nearly 800,000 deaths per year worldwide with rates of both deaths and attempts rising. Family studies have estimated substantial heritability of suicidal behavior; however, collecting the sample sizes necessary for successful genetic studies has remained a challenge. We utilized two different approaches in independent datasets to characterize the contribution of common genetic variation to suicide attempt. The first is a patient reported suicide attempt phenotype asked as part of an online mental health survey taken by a subset of participants (n = 157,366) in the UK Biobank. After quality control, we leveraged a genotyped set of unrelated, white British ancestry participants including 2433 cases and 334,766 controls that included those that did not participate in the survey or were not explicitly asked about attempting suicide. The second leveraged electronic health record (EHR) data from the Vanderbilt University Medical Center (VUMC, 2.8 million patients, 3250 cases) and machine learning to derive probabilities of attempting suicide in 24,546 genotyped patients. We identified significant and comparable heritability estimates of suicide attempt from both the patient reported phenotype in the UK Biobank (h2SNP = 0.035, p = 7.12 × 10-4) and the clinically predicted phenotype from VUMC (h2SNP = 0.046, p = 1.51 × 10-2). A significant genetic overlap was demonstrated between the two measures of suicide attempt in these independent samples through polygenic risk score analysis (t = 4.02, p = 5.75 × 10-5) and genetic correlation (rg = 1.073, SE = 0.36, p = 0.003). Finally, we show significant but incomplete genetic correlation of suicide attempt with insomnia (rg = 0.34-0.81) as well as several psychiatric disorders (rg = 0.26-0.79). This work demonstrates the contribution of common genetic variation to suicide attempt. It points to a genetic underpinning to clinically predicted risk of attempting suicide that is similar to the genetic profile from a patient reported outcome. Lastly, it presents an approach for using EHR data and clinical prediction to generate quantitative measures from binary phenotypes that can improve power for genetic studies.


Asunto(s)
Estudio de Asociación del Genoma Completo , Aprendizaje Automático , Probabilidad , Intento de Suicidio/estadística & datos numéricos , Bancos de Muestras Biológicas , Registros Electrónicos de Salud , Femenino , Encuestas Epidemiológicas , Humanos , Masculino , Salud Mental , Fenotipo , Factores de Riesgo , Ideación Suicida , Tennessee , Reino Unido , Población Blanca/genética
14.
Lipids Health Dis ; 20(1): 113, 2021 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-34548093

RESUMEN

BACKGROUND: Hypertriglyceridemia has emerged as a critical coronary artery disease (CAD) risk factor. Rare loss-of-function (LoF) variants in apolipoprotein C-III have been reported to reduce triglycerides (TG) and are cardioprotective in American Indians and Europeans. However, there is a lack of data in other Europeans and non-Europeans. Also, whether genetically increased plasma TG due to ApoC-III is causally associated with increased CAD risk is still unclear and inconsistent. The objectives of this study were to verify the cardioprotective role of earlier reported six LoF variants of APOC3 in South Asians and other multi-ethnic cohorts and to evaluate the causal association of TG raising common variants for increasing CAD risk. METHODS: We performed gene-centric and Mendelian randomization analyses and evaluated the role of genetic variation encompassing APOC3 for affecting circulating TG and the risk for developing CAD. RESULTS: One rare LoF variant (rs138326449) with a 37% reduction in TG was associated with lowered risk for CAD in Europeans (p = 0.007), but we could not confirm this association in Asian Indians (p = 0.641). Our data could not validate the cardioprotective role of other five LoF variants analysed. A common variant rs5128 in the APOC3 was strongly associated with elevated TG levels showing a p-value 2.8 × 10- 424. Measures of plasma ApoC-III in a small subset of Sikhs revealed a 37% increase in ApoC-III concentrations among homozygous mutant carriers than the wild-type carriers of rs5128. A genetically instrumented per 1SD increment of plasma TG level of 15 mg/dL would cause a mild increase (3%) in the risk for CAD (p = 0.042). CONCLUSIONS: Our results highlight the challenges of inclusion of rare variant information in clinical risk assessment and the generalizability of implementation of ApoC-III inhibition for treating atherosclerotic disease. More studies would be needed to confirm whether genetically raised TG and ApoC-III concentrations would increase CAD risk.


Asunto(s)
Apolipoproteína C-III/genética , Enfermedad de la Arteria Coronaria/genética , Variación Genética , Anciano , Alelos , Enfermedad de la Arteria Coronaria/etnología , Europa (Continente)/epidemiología , Femenino , Estudios de Asociación Genética , Genotipo , Heterocigoto , Humanos , India/epidemiología , Masculino , Análisis de la Aleatorización Mendeliana , Persona de Mediana Edad , Mutación , Riesgo , Análisis de Secuencia de ADN , Triglicéridos/sangre
15.
Bioinformatics ; 35(14): 2495-2497, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-30520965

RESUMEN

SUMMARY: Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here, we present Global Biobank Engine (GBE), a web-based tool that enables exploration of the relationship between genotype and phenotype in biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities. AVAILABILITY AND IMPLEMENTATION: GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.


Asunto(s)
Bancos de Muestras Biológicas , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Fenómica , Fenotipo
17.
J Plant Res ; 131(4): 709-717, 2018 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-29460198

RESUMEN

Recent studies have shown that environmental DNA is found almost everywhere. Flower petal surfaces are an attractive tissue to use for investigation of the dispersal of environmental DNA in nature as they are isolated from the external environment until the bud opens and only then can the petal surface accumulate environmental DNA. Here, we performed a crowdsourced experiment, the "Ohanami Project", to obtain environmental DNA samples from petal surfaces of Cerasus × yedoensis 'Somei-yoshino' across the Japanese archipelago during spring 2015. C. × yedoensis is the most popular garden cherry species in Japan and clones of this cultivar bloom simultaneously every spring. Data collection spanned almost every prefecture and totaled 577 DNA samples from 149 collaborators. Preliminary amplicon-sequencing analysis showed the rapid attachment of environmental DNA onto the petal surfaces. Notably, we found DNA of other common plant species in samples obtained from a wide distribution; this DNA likely originated from the pollen of the Japanese cedar. Our analysis supports our belief that petal surfaces after blossoming are a promising target to reveal the dynamics of environmental DNA in nature. The success of our experiment also shows that crowdsourced environmental DNA analyses have considerable value in ecological studies.


Asunto(s)
ADN de Plantas/genética , ADN/genética , Ambiente , Flores/genética , Prunus/genética , Cloroplastos/genética , Cianobacterias/genética , Flores/microbiología , Japón , Proteobacteria/genética , Prunus/microbiología , Alineación de Secuencia , Análisis de Secuencia de ADN
18.
Nat Commun ; 15(1): 4433, 2024 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-38811555

RESUMEN

Dominance heritability in complex traits has received increasing recognition. However, most polygenic score (PGS) approaches do not incorporate non-additive effects. Here, we present GenoBoost, a flexible PGS modeling framework capable of considering both additive and non-additive effects, specifically focusing on genetic dominance. Building on statistical boosting theory, we derive provably optimal GenoBoost scores and provide its efficient implementation for analyzing large-scale cohorts. We benchmark it against seven commonly used PGS methods and demonstrate its competitive predictive performance. GenoBoost is ranked the best for four traits and second-best for three traits among twelve tested disease outcomes in UK Biobank. We reveal that GenoBoost improves prediction for autoimmune diseases by incorporating non-additive effects localized in the MHC locus and, more broadly, works best in less polygenic traits. We further demonstrate that GenoBoost can infer the mode of genetic inheritance without requiring prior knowledge. For example, GenoBoost finds non-zero genetic dominance effects for 602 of 900 selected genetic variants, resulting in 2.5% improvements in predicting psoriasis cases. Lastly, we show that GenoBoost can prioritize genetic loci with genetic dominance not previously reported in the GWAS catalog. Our results highlight the increased accuracy and biological insights from incorporating non-additive effects in PGS models.


Asunto(s)
Estudio de Asociación del Genoma Completo , Modelos Genéticos , Herencia Multifactorial , Herencia Multifactorial/genética , Humanos , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad , Enfermedades Autoinmunes/genética , Genes Dominantes , Psoriasis/genética
19.
Ann Appl Stat ; 16(3): 1891-1918, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-36091495

RESUMEN

In high-dimensional regression problems, often a relatively small subset of the features are relevant for predicting the outcome, and methods that impose sparsity on the solution are popular. When multiple correlated outcomes are available (multitask), reduced rank regression is an effective way to borrow strength and capture latent structures that underlie the data. Our proposal is motivated by the UK Biobank population-based cohort study, where we are faced with large-scale, ultrahigh-dimensional features, and have access to a large number of outcomes (phenotypes)-lifestyle measures, biomarkers, and disease outcomes. We are hence led to fit sparse reduced-rank regression models, using computational strategies that allow us to scale to problems of this size. We use a scheme that alternates between solving the sparse regression problem and solving the reduced rank decomposition. For the sparse regression component we propose a scalable iterative algorithm based on adaptive screening that leverages the sparsity assumption and enables us to focus on solving much smaller subproblems. The full solution is reconstructed and tested via an optimality condition to make sure it is a valid solution for the original problem. We further extend the method to cope with practical issues, such as the inclusion of confounding variables and imputation of missing values among the phenotypes. Experiments on both synthetic data and the UK Biobank data demonstrate the effectiveness of the method and the algorithm. We present multiSnpnet package, available at http://github.com/junyangq/multiSnpnet that works on top of PLINK2 files, which we anticipate to be a valuable tool for generating polygenic risk scores from human genetic studies.

20.
Cell Metab ; 34(10): 1578-1593.e6, 2022 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-36198295

RESUMEN

Exercise training is critical for the prevention and treatment of obesity, but its underlying mechanisms remain incompletely understood given the challenge of profiling heterogeneous effects across multiple tissues and cell types. Here, we address this challenge and opposing effects of exercise and high-fat diet (HFD)-induced obesity at single-cell resolution in subcutaneous and visceral white adipose tissue and skeletal muscle in mice with diet and exercise training interventions. We identify a prominent role of mesenchymal stem cells (MSCs) in obesity and exercise-induced tissue adaptation. Among the pathways regulated by exercise and HFD in MSCs across the three tissues, extracellular matrix remodeling and circadian rhythm are the most prominent. Inferred cell-cell interactions implicate within- and multi-tissue crosstalk centered around MSCs. Overall, our work reveals the intricacies and diversity of multi-tissue molecular responses to exercise and obesity and uncovers a previously underappreciated role of MSCs in tissue-specific and multi-tissue beneficial effects of exercise.


Asunto(s)
Tejido Adiposo , Células Madre Mesenquimatosas , Tejido Adiposo/metabolismo , Animales , Dieta Alta en Grasa , Células Madre Mesenquimatosas/metabolismo , Ratones , Ratones Endogámicos C57BL , Músculo Esquelético/metabolismo , Obesidad/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA