Pesquisa | Portal Regional da BVS

1.

Efficient storage and regression computation for population-scale genome sequencing studies.

Rivas, Manuel A; Chang, Christopher.

bioRxiv ; 2024 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-38659813

RESUMO

In the era of big data in human genetics, large-scale biobanks aggregating genetic data from diverse populations have emerged as important for advancing our understanding of human health and disease. However, the computational and storage demands of whole genome sequencing (WGS) studies pose significant challenges, especially for researchers from underfunded institutions or developing countries, creating a disparity in research capabilities. We introduce new approaches that significantly enhance computational efficiency and reduce data storage requirements for WGS studies. By developing algorithms for compressed storage of genetic data, focusing particularly on optimizing the representation of rare variants, and designing regression methods tailored for the scale and complexity of WGS data, we significantly lower computational and storage costs. We integrate our approach into PLINK 2.0. The implementation demonstrates considerable reductions in storage space and computational time without compromising analytical accuracy, as evidenced by the application to the AllofUs project data. We improve runtime of an exome-wide association analysis of 19.4 million variants and a single phenotype from 695.35 minutes (approximately 11.5 hours) on a single machine to 1.57 minutes using 30Gb of memory and 50 threads (8.67 minutes using 4 threads). Similarly, we generalize to multi-phenotype analysis. We anticipate that our approach will enable researchers across the globe to unlock the potential of population biobanks, accelerating the pace of discoveries that can improve our understanding of human health and disease.

2.

Integrative machine learning approaches for predicting disease risk using multi-omics data from the UK Biobank.

Aguilar, Oscar; Chang, Cheng; Bismuth, Elsa; Rivas, Manuel A.

bioRxiv ; 2024 Apr 20.

Artigo em Inglês | MEDLINE | ID: mdl-38659731

RESUMO

We train prediction and survival models using multi-omics data for disease risk identification and stratification. Existing work on disease prediction focuses on risk analysis using datasets of individual data types (metabolomic, genomics, demographic), while our study creates an integrated model for disease risk assessment. We compare machine learning models such as Lasso Regression, Multi-Layer Perceptron, XG Boost, and ADA Boost to analyze multi-omics data, incorporating ROC-AUC score comparisons for various diseases and feature combinations. Additionally, we train Cox proportional hazard models for each disease to perform survival analysis. Although the integration of multi-omics data significantly improves risk prediction for 8 diseases, we find that the contribution of metabolomic data is marginal when compared to standard demographic, genetic, and biomarker features. Nonetheless, we see that metabolomics is a useful replacement for the standard biomarker panel when it is not readily available.

3.

SGLT2 inhibitor ameliorates endothelial dysfunction associated with the common ALDH2 alcohol flushing variant.

Guo, Hongchao; Yu, Xuan; Liu, Yu; Paik, David T; Justesen, Johanne Marie; Chandy, Mark; Jahng, James W S; Zhang, Tiejun; Wu, Weijun; Rwere, Freeborn; Zhao, Shane Rui; Pokhrel, Suman; Shivnaraine, Rabindra V; Mukherjee, Souhrid; Simon, Daniel J; Manhas, Amit; Zhang, Angela; Chen, Che-Hong; Rivas, Manuel A; Gross, Eric R; Mochly-Rosen, Daria; Wu, Joseph C.

Sci Transl Med ; 15(680): eabp9952, 2023 01 25.

Artigo em Inglês | MEDLINE | ID: mdl-36696485

RESUMO

The common aldehyde dehydrogenase 2 (ALDH2) alcohol flushing variant known as ALDH2*2 affects â¼8% of the world's population. Even in heterozygous carriers, this missense variant leads to a severe loss of ALDH2 enzymatic activity and has been linked to an increased risk of coronary artery disease (CAD). Endothelial cell (EC) dysfunction plays a determining role in all stages of CAD pathogenesis, including early-onset CAD. However, the contribution of ALDH2*2 to EC dysfunction and its relation to CAD are not fully understood. In a large genome-wide association study (GWAS) from Biobank Japan, ALDH2*2 was found to be one of the strongest single-nucleotide polymorphisms associated with CAD. Clinical assessment of endothelial function showed that human participants carrying ALDH2*2 exhibited impaired vasodilation after light alcohol drinking. Using human induced pluripotent stem cell-derived ECs (iPSC-ECs) and CRISPR-Cas9-corrected ALDH2*2 iPSC-ECs, we modeled ALDH2*2-induced EC dysfunction in vitro, demonstrating an increase in oxidative stress and inflammatory markers and a decrease in nitric oxide (NO) production and tube formation capacity, which was further exacerbated by ethanol exposure. We subsequently found that sodium-glucose cotransporter 2 inhibitors (SGLT2i) such as empagliflozin mitigated ALDH2*2-associated EC dysfunction. Studies in ALDH2*2 knock-in mice further demonstrated that empagliflozin attenuated ALDH2*2-mediated vascular dysfunction in vivo. Mechanistically, empagliflozin inhibited Na+/H+-exchanger 1 (NHE-1) and activated AKT kinase and endothelial NO synthase (eNOS) pathways to ameliorate ALDH2*2-induced EC dysfunction. Together, our results suggest that ALDH2*2 induces EC dysfunction and that SGLT2i may potentially be used as a preventative measure against CAD for ALDH2*2 carriers.

Assuntos

Doença da Artéria Coronariana , Células-Tronco Pluripotentes Induzidas , Inibidores do Transportador 2 de Sódio-Glicose , Humanos , Camundongos , Animais , Aldeído-Desidrogenase Mitocondrial/genética , Estudo de Associação Genômica Ampla , Células-Tronco Pluripotentes Induzidas/metabolismo , Aldeído Desidrogenase

4.

LARGE-SCALE MULTIVARIATE SPARSE REGRESSION WITH APPLICATIONS TO UK BIOBANK.

Qian, Junyang; Tanigawa, Yosuke; Li, Ruilin; Tibshirani, Robert; Rivas, Manuel A; Hastie, Trevor.

Ann Appl Stat ; 16(3): 1891-1918, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-36091495

RESUMO

In high-dimensional regression problems, often a relatively small subset of the features are relevant for predicting the outcome, and methods that impose sparsity on the solution are popular. When multiple correlated outcomes are available (multitask), reduced rank regression is an effective way to borrow strength and capture latent structures that underlie the data. Our proposal is motivated by the UK Biobank population-based cohort study, where we are faced with large-scale, ultrahigh-dimensional features, and have access to a large number of outcomes (phenotypes)-lifestyle measures, biomarkers, and disease outcomes. We are hence led to fit sparse reduced-rank regression models, using computational strategies that allow us to scale to problems of this size. We use a scheme that alternates between solving the sparse regression problem and solving the reduced rank decomposition. For the sparse regression component we propose a scalable iterative algorithm based on adaptive screening that leverages the sparsity assumption and enables us to focus on solving much smaller subproblems. The full solution is reconstructed and tested via an optimality condition to make sure it is a valid solution for the original problem. We further extend the method to cope with practical issues, such as the inclusion of confounding variables and imputation of missing values among the phenotypes. Experiments on both synthetic data and the UK Biobank data demonstrate the effectiveness of the method and the algorithm. We present multiSnpnet package, available at http://github.com/junyangq/multiSnpnet that works on top of PLINK2 files, which we anticipate to be a valuable tool for generating polygenic risk scores from human genetic studies.

5.

Opportunities and challenges for the use of common controls in sequencing studies.

Wojcik, Genevieve L; Murphy, Jessica; Edelson, Jacob L; Gignoux, Christopher R; Ioannidis, Alexander G; Manning, Alisa; Rivas, Manuel A; Buyske, Steven; Hendricks, Audrey E.

Nat Rev Genet ; 23(11): 665-679, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-35581355

RESUMO

Genome-wide association studies using large-scale genome and exome sequencing data have become increasingly valuable in identifying associations between genetic variants and disease, transforming basic research and translational medicine. However, this progress has not been equally shared across all people and conditions, in part due to limited resources. Leveraging publicly available sequencing data as external common controls, rather than sequencing new controls for every study, can better allocate resources by augmenting control sample sizes or providing controls where none existed. However, common control studies must be carefully planned and executed as even small differences in sample ascertainment and processing can result in substantial bias. Here, we discuss challenges and opportunities for the robust use of common controls in high-throughput sequencing studies, including study design, quality control and statistical approaches. Thoughtful generation and use of large and valuable genetic sequencing data sets will enable investigation of a broader and more representative set of conditions, environments and genetic ancestries than otherwise possible.

Assuntos

Exoma , Estudo de Associação Genômica Ampla , Exoma/genética , Predisposição Genética para Doença , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Sequenciamento do Exoma

6.

High heritability of ascending aortic diameter and trans-ancestry prediction of thoracic aortic disease.

Tcheandjieu, Catherine; Xiao, Ke; Tejeda, Helio; Lynch, Julie A; Ruotsalainen, Sanni; Bellomo, Tiffany; Palnati, Madhuri; Judy, Renae; Klarin, Derek; Kember, Rachel L; Verma, Shefali; Palotie, Aarno; Daly, Mark; Ritchie, Marylyn; Rader, Daniel J; Rivas, Manuel A; Assimes, Themistocles; Tsao, Philip; Damrauer, Scott; Priest, James R.

Nat Genet ; 54(6): 772-782, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35637384

RESUMO

Enlargement of the aorta is an important risk factor for aortic aneurysm and dissection, a leading cause of morbidity in the developed world. Here we performed automated extraction of ascending aortic diameter from cardiac magnetic resonance images of 36,021 individuals from the UK Biobank, followed by genome-wide association. We identified lead variants across 41 loci, including genes related to cardiovascular development (HAND2, TBX20) and Mendelian forms of thoracic aortic disease (ELN, FBN1). A polygenic score significantly predicted prevalent risk of thoracic aortic aneurysm and the need for surgical intervention for patients with thoracic aneurysm across multiple ancestries within the UK Biobank, FinnGen, the Penn Medicine Biobank and the Million Veterans Program (MVP). Additionally, we highlight the primary causal role of blood pressure in reducing aortic dilation using Mendelian randomization. Overall, our findings provide a roadmap for using genetic determinants of human anatomy to understand cardiovascular development while improving prediction of diseases of the thoracic aorta.

Assuntos

Aneurisma da Aorta Torácica , Aneurisma Aórtico , Dissecção Aórtica , Dissecção Aórtica/genética , Dissecção Aórtica/patologia , Dissecção Aórtica/cirurgia , Aorta/patologia , Aneurisma Aórtico/patologia , Aneurisma da Aorta Torácica/genética , Estudo de Associação Genômica Ampla , Humanos

7.

Integration of rare expression outlier-associated variants improves polygenic risk prediction.

Smail, Craig; Ferraro, Nicole M; Hui, Qin; Durrant, Matthew G; Aguirre, Matthew; Tanigawa, Yosuke; Keever-Keigher, Marissa R; Rao, Abhiram S; Justesen, Johanne M; Li, Xin; Gloudemans, Michael J; Assimes, Themistocles L; Kooperberg, Charles; Reiner, Alexander P; Huang, Jie; O'Donnell, Christopher J; Sun, Yan V; Rivas, Manuel A; Montgomery, Stephen B.

Am J Hum Genet ; 109(6): 1055-1064, 2022 06 02.

Artigo em Inglês | MEDLINE | ID: mdl-35588732

RESUMO

Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p = 3 × 10-14), 62.3% increase in risk for severe obesity (p = 1 × 10-6), and median 5.29 years earlier onset for bariatric surgery (p = 0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p = 2 × 10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.

Assuntos

Herança Multifatorial , Obesidade , Índice de Massa Corporal , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Herança Multifatorial/genética , Obesidade/genética , Fenótipo , Fatores de Risco

8.

Significant sparse polygenic risk scores across 813 traits in UK Biobank.

Tanigawa, Yosuke; Qian, Junyang; Venkataraman, Guhan; Justesen, Johanne Marie; Li, Ruilin; Tibshirani, Robert; Hastie, Trevor; Rivas, Manuel A.

PLoS Genet ; 18(3): e1010105, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-35324888

RESUMO

We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,500 traits using genetic and phenotype data in the UK Biobank. We report 813 sparse PRS models with significant (p < 2.5 x 10-5) incremental predictive performance when compared against the covariate-only model that considers age, sex, types of genotyping arrays, and the principal component loadings of genotypes. We report a significant correlation between the number of genetic variants selected in the sparse PRS model and the incremental predictive performance (Spearman's â´ = 0.61, p = 2.2 x 10-59 for quantitative traits, â´ = 0.21, p = 9.6 x 10-4 for binary traits). The sparse PRS model trained on European individuals showed limited transferability when evaluated on non-European individuals in the UK Biobank. We provide the PRS model weights on the Global Biobank Engine (https://biobankengine.stanford.edu/prs).

Assuntos

Estudo de Associação Genômica Ampla , Herança Multifatorial , Bancos de Espécimes Biológicos , Predisposição Genética para Doença , Humanos , Herança Multifatorial/genética , Fenótipo , Fatores de Risco , Reino Unido

9.

Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank.

Li, Ruilin; Chang, Christopher; Justesen, Johanne M; Tanigawa, Yosuke; Qian, Junyang; Hastie, Trevor; Rivas, Manuel A; Tibshirani, Robert.

Biostatistics ; 23(2): 522-540, 2022 04 13.

Artigo em Inglês | MEDLINE | ID: mdl-32989444

RESUMO

We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the $L^1$-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in Qian and others (2019). Our algorithm is particularly suitable for large-scale and high-dimensional data that do not fit in the memory. The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow and others, 2015). We provide a publicly available implementation of the proposed approach for genetics data on top of the PLINK2 package and name it snpnet-Cox.

Assuntos

Algoritmos , Bancos de Espécimes Biológicos , Humanos , Funções Verossimilhança , Modelos de Riscos Proporcionais , Reino Unido

10.

Corrigendum to: Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank.

Li, Ruilin; Chang, Christopher; Justesen, Johanne M; Tanigawa, Yosuke; Qian, Junyang; Hastie, Trevor; Rivas, Manuel A; Tibshirani, Robert.

Biostatistics ; 23(2): 683, 2022 Apr 13.

Artigo em Inglês | MEDLINE | ID: mdl-34269393

11.

Bayesian model comparison for rare-variant association studies.

Venkataraman, Guhan Ram; DeBoever, Christopher; Tanigawa, Yosuke; Aguirre, Matthew; Ioannidis, Alexander G; Mostafavi, Hakhamanesh; Spencer, Chris C A; Poterba, Timothy; Bustamante, Carlos D; Daly, Mark J; Pirinen, Matti; Rivas, Manuel A.

Am J Hum Genet ; 108(12): 2354-2367, 2021 12 02.

Artigo em Inglês | MEDLINE | ID: mdl-34822764

RESUMO

Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.

Assuntos

Variação Genética , Estudo de Associação Genômica Ampla , Modelos Genéticos , Teorema de Bayes , Feminino , Humanos , Masculino , Fenótipo

12.

Association of accelerometer-derived sleep measures with lifetime psychiatric diagnoses: A cross-sectional study of 89,205 participants from the UK Biobank.

Wainberg, Michael; Jones, Samuel E; Beaupre, Lindsay Melhuish; Hill, Sean L; Felsky, Daniel; Rivas, Manuel A; Lim, Andrew S P; Ollila, Hanna M; Tripathy, Shreejoy J.

PLoS Med ; 18(10): e1003782, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34637446

RESUMO

BACKGROUND: Sleep problems are both symptoms of and modifiable risk factors for many psychiatric disorders. Wrist-worn accelerometers enable objective measurement of sleep at scale. Here, we aimed to examine the association of accelerometer-derived sleep measures with psychiatric diagnoses and polygenic risk scores in a large community-based cohort. METHODS AND FINDINGS: In this post hoc cross-sectional analysis of the UK Biobank cohort, 10 interpretable sleep measures-bedtime, wake-up time, sleep duration, wake after sleep onset, sleep efficiency, number of awakenings, duration of longest sleep bout, number of naps, and variability in bedtime and sleep duration-were derived from 7-day accelerometry recordings across 89,205 participants (aged 43 to 79, 56% female, 97% self-reported white) taken between 2013 and 2015. These measures were examined for association with lifetime inpatient diagnoses of major depressive disorder, anxiety disorders, bipolar disorder/mania, and schizophrenia spectrum disorders from any time before the date of accelerometry, as well as polygenic risk scores for major depression, bipolar disorder, and schizophrenia. Covariates consisted of age and season at the time of the accelerometry recording, sex, Townsend deprivation index (an indicator of socioeconomic status), and the top 10 genotype principal components. We found that sleep pattern differences were ubiquitous across diagnoses: each diagnosis was associated with a median of 8.5 of the 10 accelerometer-derived sleep measures, with measures of sleep quality (for instance, sleep efficiency) generally more affected than mere sleep duration. Effect sizes were generally small: for instance, the largest magnitude effect size across the 4 diagnoses was ß = -0.11 (95% confidence interval -0.13 to -0.10, p = 3 × 10-56, FDR = 6 × 10-55) for the association between lifetime inpatient major depressive disorder diagnosis and sleep efficiency. Associations largely replicated across ancestries and sexes, and accelerometry-derived measures were concordant with self-reported sleep properties. Limitations include the use of accelerometer-based sleep measurement and the time lag between psychiatric diagnoses and accelerometry. CONCLUSIONS: In this study, we observed that sleep pattern differences are a transdiagnostic feature of individuals with lifetime mental illness, suggesting that they should be considered regardless of diagnosis. Accelerometry provides a scalable way to objectively measure sleep properties in psychiatric clinical research and practice, even across tens of thousands of individuals.

Assuntos

Acelerometria/instrumentação , Bancos de Espécimes Biológicos , Transtornos Mentais/fisiopatologia , Sono/fisiologia , Adulto , Idoso , Estudos de Coortes , Estudos Transversais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Herança Multifatorial , Reprodutibilidade dos Testes , Fatores de Risco , Autorrelato , Reino Unido

13.

A cross-population atlas of genetic associations for 220 human phenotypes.

Sakaue, Saori; Kanai, Masahiro; Tanigawa, Yosuke; Karjalainen, Juha; Kurki, Mitja; Koshiba, Seizo; Narita, Akira; Konuma, Takahiro; Yamamoto, Kenichi; Akiyama, Masato; Ishigaki, Kazuyoshi; Suzuki, Akari; Suzuki, Ken; Obara, Wataru; Yamaji, Ken; Takahashi, Kazuhisa; Asai, Satoshi; Takahashi, Yasuo; Suzuki, Takao; Shinozaki, Nobuaki; Yamaguchi, Hiroki; Minami, Shiro; Murayama, Shigeo; Yoshimori, Kozo; Nagayama, Satoshi; Obata, Daisuke; Higashiyama, Masahiko; Masumoto, Akihide; Koretsune, Yukihiro; Ito, Kaoru; Terao, Chikashi; Yamauchi, Toshimasa; Komuro, Issei; Kadowaki, Takashi; Tamiya, Gen; Yamamoto, Masayuki; Nakamura, Yusuke; Kubo, Michiaki; Murakami, Yoshinori; Yamamoto, Kazuhiko; Kamatani, Yoichiro; Palotie, Aarno; Rivas, Manuel A; Daly, Mark J; Matsuda, Koichi; Okada, Yukinori.

Nat Genet ; 53(10): 1415-1424, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34594039

RESUMO

Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records. Meta-analyses with the UK Biobank and FinnGen (ntotal = 628,000) identified ~5,000 new loci, which improved the resolution of the genomic map of human traits. This atlas elucidated the landscape of pleiotropy as represented by the major histocompatibility complex locus, where we conducted HLA fine-mapping. Finally, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified latent genetic components, which pinpointed responsible variants and biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (for example, allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human diseases through genetics.

Assuntos

Estudos de Associação Genética , Predisposição Genética para Doença , Sistema ABO de Grupos Sanguíneos/genética , Bancos de Espécimes Biológicos , Loci Gênicos , Pleiotropia Genética , Estudo de Associação Genômica Ampla , Humanos , Complexo Principal de Histocompatibilidade/genética , Metanálise como Assunto , Mutação/genética , Fenótipo

14.

Author Correction: Genetics of 35 blood and urine biomarkers in the UK Biobank.

Sinnott-Armstrong, Nasa; Tanigawa, Yosuke; Amar, David; Mars, Nina; Benner, Christian; Aguirre, Matthew; Venkataraman, Guhan Ram; Wainberg, Michael; Ollila, Hanna M; Kiiskinen, Tuomo; Havulinna, Aki S; Pirruccello, James P; Qian, Junyang; Shcherbina, Anna; Rodriguez, Fatima; Assimes, Themistocles L; Agarwala, Vineeta; Tibshirani, Robert; Hastie, Trevor; Ripatti, Samuli; Pritchard, Jonathan K; Daly, Mark J; Rivas, Manuel A.

Nat Genet ; 53(11): 1622, 2021 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-34608296

15.

APOC3 genetic variation, serum triglycerides, and risk of coronary artery disease in Asian Indians, Europeans, and other ethnic groups.

Goyal, Shiwali; Tanigawa, Yosuke; Zhang, Weihua; Chai, Jin-Fang; Almeida, Marcio; Sim, Xueling; Lerner, Megan; Chainakul, Juliane; Ramiu, Jonathan Garcia; Seraphin, Chanel; Apple, Blair; Vaughan, April; Muniu, James; Peralta, Juan; Lehman, Donna M; Ralhan, Sarju; Wander, Gurpreet S; Singh, Jai Rup; Mehra, Narinder K; Sidorov, Evgeny; Peyton, Marvin D; Blackett, Piers R; Curran, Joanne E; Tai, E Shyong; van Dam, Rob; Cheng, Ching-Yu; Duggirala, Ravindranath; Blangero, John; Chambers, John C; Sabanayagam, Charumathi; Kooner, Jaspal S; Rivas, Manuel A; Aston, Christopher E; Sanghera, Dharambir K.

Lipids Health Dis ; 20(1): 113, 2021 Sep 21.

Artigo em Inglês | MEDLINE | ID: mdl-34548093

RESUMO

BACKGROUND: Hypertriglyceridemia has emerged as a critical coronary artery disease (CAD) risk factor. Rare loss-of-function (LoF) variants in apolipoprotein C-III have been reported to reduce triglycerides (TG) and are cardioprotective in American Indians and Europeans. However, there is a lack of data in other Europeans and non-Europeans. Also, whether genetically increased plasma TG due to ApoC-III is causally associated with increased CAD risk is still unclear and inconsistent. The objectives of this study were to verify the cardioprotective role of earlier reported six LoF variants of APOC3 in South Asians and other multi-ethnic cohorts and to evaluate the causal association of TG raising common variants for increasing CAD risk. METHODS: We performed gene-centric and Mendelian randomization analyses and evaluated the role of genetic variation encompassing APOC3 for affecting circulating TG and the risk for developing CAD. RESULTS: One rare LoF variant (rs138326449) with a 37% reduction in TG was associated with lowered risk for CAD in Europeans (p = 0.007), but we could not confirm this association in Asian Indians (p = 0.641). Our data could not validate the cardioprotective role of other five LoF variants analysed. A common variant rs5128 in the APOC3 was strongly associated with elevated TG levels showing a p-value 2.8 × 10- 424. Measures of plasma ApoC-III in a small subset of Sikhs revealed a 37% increase in ApoC-III concentrations among homozygous mutant carriers than the wild-type carriers of rs5128. A genetically instrumented per 1SD increment of plasma TG level of 15 mg/dL would cause a mild increase (3%) in the risk for CAD (p = 0.042). CONCLUSIONS: Our results highlight the challenges of inclusion of rare variant information in clinical risk assessment and the generalizability of implementation of ApoC-III inhibition for treating atherosclerotic disease. More studies would be needed to confirm whether genetically raised TG and ApoC-III concentrations would increase CAD risk.

Assuntos

Apolipoproteína C-III/genética , Doença da Artéria Coronariana/genética , Variação Genética , Idoso , Alelos , Doença da Artéria Coronariana/etnologia , Europa (Continente)/epidemiologia , Feminino , Estudos de Associação Genética , Genótipo , Heterozigoto , Humanos , Índia/epidemiologia , Masculino , Análise da Randomização Mendeliana , Pessoa de Meia-Idade , Mutação , Risco , Análise de Sequência de DNA , Triglicerídeos/sangue

16.

Nonsense-mediated decay is highly stable across individuals and tissues.

Teran, Nicole A; Nachun, Daniel C; Eulalio, Tiffany; Ferraro, Nicole M; Smail, Craig; Rivas, Manuel A; Montgomery, Stephen B.

Am J Hum Genet ; 108(8): 1401-1408, 2021 08 05.

Artigo em Inglês | MEDLINE | ID: mdl-34216550

RESUMO

Precise interpretation of the effects of rare protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA sequencing of the Genotype Tissue Expression v.8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency ≤ 1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, the inclusion of ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.

Assuntos

Códon sem Sentido/genética , Regulação da Expressão Gênica , Doenças Genéticas Inatas/patologia , Variação Genética , Mutação , Degradação do RNAm Mediada por Códon sem Sentido , RNA Mensageiro/genética , Frequência do Gene , Doenças Genéticas Inatas/genética , Humanos

17.

Fast numerical optimization for genome sequencing data in population biobanks.

Li, Ruilin; Chang, Christopher; Tanigawa, Yosuke; Narasimhan, Balasubramanian; Hastie, Trevor; Tibshirani, Robert; Rivas, Manuel A.

Bioinformatics ; 37(22): 4148-4155, 2021 11 18.

Artigo em Inglês | MEDLINE | ID: mdl-34146108

RESUMO

MOTIVATION: Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. RESULTS: We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1â000â000 variants and almost 100â000 individuals within 10 min and using less than 32GB of memory. AVAILABILITY AND IMPLEMENTATION: https://github.com/rivas-lab/snpnet/tree/compact.

Assuntos

Bancos de Espécimes Biológicos , Genoma , Humanos , Algoritmos , Mapeamento Cromossômico , Análise dos Mínimos Quadrados

18.

Time trajectories in the transcriptomic response to exercise - a meta-analysis.

Amar, David; Lindholm, Malene E; Norrbom, Jessica; Wheeler, Matthew T; Rivas, Manuel A; Ashley, Euan A.

Nat Commun ; 12(1): 3471, 2021 06 09.

Artigo em Inglês | MEDLINE | ID: mdl-34108459

RESUMO

Exercise training prevents multiple diseases, yet the molecular mechanisms that drive exercise adaptation are incompletely understood. To address this, we create a computational framework comprising data from skeletal muscle or blood from 43 studies, including 739 individuals before and after exercise or training. Using linear mixed effects meta-regression, we detect specific time patterns and regulatory modulators of the exercise response. Acute and long-term responses are transcriptionally distinct and we identify SMAD3 as a central regulator of the exercise response. Exercise induces a more pronounced inflammatory response in skeletal muscle of older individuals and our models reveal multiple sex-associated responses. We validate seven of our top genes in a separate human cohort. In this work, we provide a powerful resource ( www.extrameta.org ) that expands the transcriptional landscape of exercise adaptation by extending previously known responses and their regulatory networks, and identifying novel modality-, time-, age-, and sex-associated changes.

Assuntos

Exercício Físico/fisiologia , Transcriptoma , Adaptação Fisiológica/genética , Fatores Etários , Treino Aeróbico , Proteínas da Matriz Extracelular/genética , Redes Reguladoras de Genes , Humanos , Inflamação/genética , Músculo Esquelético/fisiologia , Reprodutibilidade dos Testes , Treinamento Resistido , Proteína Smad3/genética , Biologia de Sistemas , Fatores de Tempo

19.

Combining Clinical and Polygenic Risk Improves Stroke Prediction Among Individuals With Atrial Fibrillation.

O'Sullivan, Jack W; Shcherbina, Anna; Justesen, Johanne M; Turakhia, Mintu; Perez, Marco; Wand, Hannah; Tcheandjieu, Catherine; Clarke, Shoa L; Rivas, Manuel A; Ashley, Euan A.

Circ Genom Precis Med ; 14(3): e003168, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-34029116

RESUMO

BACKGROUND: Atrial fibrillation (AF) is associated with a five-fold increased risk of ischemic stroke. A portion of this risk is heritable; however, current risk stratification tools (CHA2DS2-VASc) do not include family history or genetic risk. We hypothesized that we could improve ischemic stroke prediction in patients with AF by incorporating polygenic risk scores (PRS). METHODS: Using data from the largest available genome-wide association study in Europeans, we combined over half a million genetic variants to construct a PRS to predict ischemic stroke in patients with AF. We externally validated this PRS in independent data from the UK Biobank, both independently and integrated with clinical risk factors. The integrated PRS and clinical risk factors risk tool had the greatest predictive ability. RESULTS: Compared with the currently recommended risk tool (CHA2DS2-VASc), the integrated tool significantly improved Net Reclassification Index (2.3% [95% CI, 1.3%-3.0%]) and fit (χ2P=0.002). Using this improved tool, >115 000 people with AF would have improved risk classification in the United States. Independently, PRS was a significant predictor of ischemic stroke in patients with AF prospectively (hazard ratio, 1.13 per 1 SD [95% CI, 1.06-1.23]). Lastly, polygenic risk scores were uncorrelated with clinical risk factors (Pearson correlation coefficient, -0.018). CONCLUSIONS: In patients with AF, there appears to be a significant association between PRS and risk of ischemic stroke. The greatest predictive ability was found with the integration of PRS and clinical risk factors; however, the prediction of stroke remains challenging.

Assuntos

Fibrilação Atrial , Estudo de Associação Genômica Ampla , AVC Isquêmico , Idoso , Fibrilação Atrial/complicações , Fibrilação Atrial/genética , Fibrilação Atrial/fisiopatologia , Feminino , Humanos , AVC Isquêmico/etiologia , AVC Isquêmico/genética , AVC Isquêmico/fisiopatologia , Masculino , Pessoa de Meia-Idade , Medição de Risco

20.

Survival analysis on rare events using group-regularized multi-response Cox regression.

Li, Ruilin; Tanigawa, Yosuke; Justesen, Johanne M; Taylor, Jonathan; Hastie, Trevor; Tibshirani, Robert; Rivas, Manuel A.

Bioinformatics ; 37(23): 4437-4443, 2021 12 07.

Artigo em Inglês | MEDLINE | ID: mdl-33560296

RESUMO

MOTIVATION: The prediction performance of Cox proportional hazard model suffers when there are only few uncensored events in the training data. RESULTS: We propose a Sparse-Group regularized Cox regression method to improve the prediction performance of large-scale and high-dimensional survival data with few observed events. Our approach is applicable when there is one or more other survival responses that 1. has a large number of observed events; 2. share a common set of associated predictors with the rare event response. This scenario is common in the UK Biobank dataset where records for a large number of common and less prevalent diseases of the same set of individuals are available. By analyzing these responses together, we hope to achieve higher prediction performance than when they are analyzed individually. To make this approach practical for large-scale data, we developed an accelerated proximal gradient optimization algorithm as well as a screening procedure inspired by Qian et al. AVAILABILITYANDIMPLEMENTATION: https://github.com/rivas-lab/multisnpnet-Cox.

Assuntos

Algoritmos , Humanos , Análise de Sobrevida , Modelos de Riscos Proporcionais , Análise de Regressão

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA