|

1.

Whole genome sequencing based analysis of inflammation biomarkers in the Trans-Omics for Precision Medicine (TOPMed) consortium.

Jiang, Min-Zhi; Gaynor, Sheila M; Li, Xihao; Van Buren, Eric; Stilp, Adrienne; Buth, Erin; Wang, Fei Fei; Manansala, Regina; Gogarten, Stephanie M; Li, Zilin; Polfus, Linda M; Salimi, Shabnam; Bis, Joshua C; Pankratz, Nathan; Yanek, Lisa R; Durda, Peter; Tracy, Russell P; Rich, Stephen S; Rotter, Jerome I; Mitchell, Braxton D; Lewis, Joshua P; Psaty, Bruce M; Pratte, Katherine A; Silverman, Edwin K; Kaplan, Robert C; Avery, Christy; North, Kari E; Mathias, Rasika A; Faraday, Nauder; Lin, Honghuang; Wang, Biqi; Carson, April P; Norwood, Arnita F; Gibbs, Richard A; Kooperberg, Charles; Lundin, Jessica; Peters, Ulrike; Dupuis, Josée; Hou, Lifang; Fornage, Myriam; Benjamin, Emelia J; Reiner, Alexander P; Bowler, Russell P; Lin, Xihong; Auer, Paul L; Raffield, Laura M.

Hum Mol Genet ; 2024 May 15.

Article En | MEDLINE | ID: mdl-38747556

Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38 465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program (with varying sample size by trait, where the minimum sample size was n = 737 for MMP-1). We identified 22 distinct single-variant associations across 6 traits-E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin-that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.

2.

MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric.

Sun, Quan; Yang, Yingxi; Rosen, Jonathan D; Chen, Jiawen; Li, Xihao; Guan, Wyliena; Jiang, Min-Zhi; Wen, Jia; Pace, Rhonda G; Blackman, Scott M; Bamshad, Michael J; Gibson, Ronald L; Cutting, Garry R; O'Neal, Wanda K; Knowles, Michael R; Kooperberg, Charles; Reiner, Alexander P; Raffield, Laura M; Carson, April P; Rich, Stephen S; Rotter, Jerome I; Loos, Ruth J F; Kenny, Eimear; Jaeger, Byron C; Min, Yuan-I; Fuchsberger, Christian; Li, Yun.

Am J Hum Genet ; 111(5): 990-995, 2024 May 02.

Article En | MEDLINE | ID: mdl-38636510

Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.

Gene Frequency , Genotype , Polymorphism, Single Nucleotide , Software , Humans , Cohort Studies , Linkage Disequilibrium , Genome-Wide Association Study/methods , Genome, Human , Quality Control , Machine Learning , Whole Genome Sequencing/standards , Whole Genome Sequencing/methods

3.

A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies.

Li, Xihao; Chen, Han; Selvaraj, Margaret Sunitha; Van Buren, Eric; Zhou, Hufeng; Wang, Yuxuan; Sun, Ryan; McCaw, Zachary R; Yu, Zhi; Arnett, Donna K; Bis, Joshua C; Blangero, John; Boerwinkle, Eric; Bowden, Donald W; Brody, Jennifer A; Cade, Brian E; Carson, April P; Carlson, Jenna C; Chami, Nathalie; Chen, Yii-Der Ida; Curran, Joanne E; de Vries, Paul S; Fornage, Myriam; Franceschini, Nora; Freedman, Barry I; Gu, Charles; Heard-Costa, Nancy L; He, Jiang; Hou, Lifang; Hung, Yi-Jen; Irvin, Marguerite R; Kaplan, Robert C; Kardia, Sharon L R; Kelly, Tanika; Konigsberg, Iain; Kooperberg, Charles; Kral, Brian G; Li, Changwei; Loos, Ruth J F; Mahaney, Michael C; Martin, Lisa W; Mathias, Rasika A; Minster, Ryan L; Mitchell, Braxton D; Montasser, May E; Morrison, Alanna C; Palmer, Nicholette D; Peyser, Patricia A; Psaty, Bruce M; Raffield, Laura M.

bioRxiv ; 2023 Nov 02.

Article En | MEDLINE | ID: mdl-37961350

Large-scale whole-genome sequencing (WGS) studies have improved our understanding of the contributions of coding and noncoding rare variants to complex human traits. Leveraging association effect sizes across multiple traits in WGS rare variant association analysis can improve statistical power over single-trait analysis, and also detect pleiotropic genes and regions. Existing multi-trait methods have limited ability to perform rare variant analysis of large-scale WGS data. We propose MultiSTAAR, a statistical framework and computationally-scalable analytical pipeline for functionally-informed multi-trait rare variant analysis in large-scale WGS studies. MultiSTAAR accounts for relatedness, population structure and correlation among phenotypes by jointly analyzing multiple traits, and further empowers rare variant association analysis by incorporating multiple functional annotations. We applied MultiSTAAR to jointly analyze three lipid traits (low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides) in 61,861 multi-ethnic samples from the Trans-Omics for Precision Medicine (TOPMed) Program. We discovered new associations with lipid traits missed by single-trait analysis, including rare variants within an enhancer of NIPSNAP3A and an intergenic region on chromosome 1.

4.

Type 2 Diabetes Modifies the Association of CAD Genomic Risk Variants With Subclinical Atherosclerosis.

Hasbani, Natalie R; Westerman, Kenneth E; Kwak, Soo Heon; Chen, Han; Li, Xihao; Di Corpo, Daniel; Wessel, Jennifer; Bis, Joshua C; Sarnowski, Chloè; Wu, Peitao; Bielak, Lawrence F; Guo, Xiuqing; Heard-Costa, Nancy; Kinney, Gregory L; Mahaney, Michael C; Montasser, May E; Palmer, Nicholette D; Raffield, Laura M; Terry, James G; Yanek, Lisa R; Bon, Jessica; Bowden, Donald W; Brody, Jennifer A; Duggirala, Ravindranath; Jacobs, David R; Kalyani, Rita R; Lange, Leslie A; Mitchell, Braxton D; Smith, Jennifer A; Taylor, Kent D; Carson, April P; Curran, Joanne E; Fornage, Myriam; Freedman, Barry I; Gabriel, Stacey; Gibbs, Richard A; Gupta, Namrata; Kardia, Sharon L R; Kral, Brian G; Momin, Zeineen; Newman, Anne B; Post, Wendy S; Viaud-Martinez, Karine A; Young, Kendra A; Becker, Lewis C; Bertoni, Alain G; Blangero, John; Carr, John J; Pratte, Katherine; Psaty, Bruce M.

Circ Genom Precis Med ; 16(6): e004176, 2023 Dec.

Article En | MEDLINE | ID: mdl-38014529

BACKGROUND: Individuals with type 2 diabetes (T2D) have an increased risk of coronary artery disease (CAD), but questions remain about the underlying pathology. Identifying which CAD loci are modified by T2D in the development of subclinical atherosclerosis (coronary artery calcification [CAC], carotid intima-media thickness, or carotid plaque) may improve our understanding of the mechanisms leading to the increased CAD in T2D. METHODS: We compared the common and rare variant associations of known CAD loci from the literature on CAC, carotid intima-media thickness, and carotid plaque in up to 29â670 participants, including up to 24â157 normoglycemic controls and 5513 T2D cases leveraging whole-genome sequencing data from the Trans-Omics for Precision Medicine program. We included first-order T2D interaction terms in each model to determine whether CAD loci were modified by T2D. The genetic main and interaction effects were assessed using a joint test to determine whether a CAD variant, or gene-based rare variant set, was associated with the respective subclinical atherosclerosis measures and then further determined whether these loci had a significant interaction test. RESULTS: Using a Bonferroni-corrected significance threshold of P<1.6×10-4, we identified 3 genes (ATP1B1, ARVCF, and LIPG) associated with CAC and 2 genes (ABCG8 and EIF2B2) associated with carotid intima-media thickness and carotid plaque, respectively, through gene-based rare variant set analysis. Both ATP1B1 and ARVCF also had significantly different associations for CAC in T2D cases versus controls. No significant interaction tests were identified through the candidate single-variant analysis. CONCLUSIONS: These results highlight T2D as an important modifier of rare variant associations in CAD loci with CAC.

Atherosclerosis , Coronary Artery Disease , Diabetes Mellitus, Type 2 , Plaque, Atherosclerotic , Humans , Coronary Artery Disease/genetics , Diabetes Mellitus, Type 2/complications , Diabetes Mellitus, Type 2/genetics , Carotid Intima-Media Thickness , Risk Factors , Atherosclerosis/genetics , Genomics

5.

Rare variants in long non-coding RNAs are associated with blood lipid levels in the TOPMed whole-genome sequencing study.

Wang, Yuxuan; Selvaraj, Margaret Sunitha; Li, Xihao; Li, Zilin; Holdcraft, Jacob A; Arnett, Donna K; Bis, Joshua C; Blangero, John; Boerwinkle, Eric; Bowden, Donald W; Cade, Brian E; Carlson, Jenna C; Carson, April P; Chen, Yii-Der Ida; Curran, Joanne E; de Vries, Paul S; Dutcher, Susan K; Ellinor, Patrick T; Floyd, James S; Fornage, Myriam; Freedman, Barry I; Gabriel, Stacey; Germer, Soren; Gibbs, Richard A; Guo, Xiuqing; He, Jiang; Heard-Costa, Nancy; Hildalgo, Bertha; Hou, Lifang; Irvin, Marguerite R; Joehanes, Roby; Kaplan, Robert C; Kardia, Sharon Lr; Kelly, Tanika N; Kim, Ryan; Kooperberg, Charles; Kral, Brian G; Levy, Daniel; Li, Changwei; Liu, Chunyu; Lloyd-Jone, Don; Loos, Ruth Jf; Mahaney, Michael C; Martin, Lisa W; Mathias, Rasika A; Minster, Ryan L; Mitchell, Braxton D; Montasser, May E; Morrison, Alanna C; Murabito, Joanne M.

Am J Hum Genet ; 110(10): 1704-1717, 2023 10 05.

Article En | MEDLINE | ID: mdl-37802043

Long non-coding RNAs (lncRNAs) are known to perform important regulatory functions in lipid metabolism. Large-scale whole-genome sequencing (WGS) studies and new statistical methods for variant set tests now provide an opportunity to assess more associations between rare variants in lncRNA genes and complex traits across the genome. In this study, we used high-coverage WGS from 66,329 participants of diverse ancestries with measurement of blood lipids and lipoproteins (LDL-C, HDL-C, TC, and TG) in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) program to investigate the role of lncRNAs in lipid variability. We aggregated rare variants for 165,375 lncRNA genes based on their genomic locations and conducted rare-variant aggregate association tests using the STAAR (variant-set test for association using annotation information) framework. We performed STAAR conditional analysis adjusting for common variants in known lipid GWAS loci and rare-coding variants in nearby protein-coding genes. Our analyses revealed 83 rare lncRNA variant sets significantly associated with blood lipid levels, all of which were located in known lipid GWAS loci (in a ±500-kb window of a Global Lipids Genetics Consortium index variant). Notably, 61 out of 83 signals (73%) were conditionally independent of common regulatory variation and rare protein-coding variation at the same loci. We replicated 34 out of 61 (56%) conditionally independent associations using the independent UK Biobank WGS data. Our results expand the genetic architecture of blood lipids to rare variants in lncRNAs.

RNA, Long Noncoding , Humans , RNA, Long Noncoding/genetics , Genome-Wide Association Study , Precision Medicine , Whole Genome Sequencing/methods , Lipids/genetics , Polymorphism, Single Nucleotide/genetics

6.

Author Correction: Whole-Genome Sequencing Analysis of Human Metabolome in Multi-Ethnic Populations.

Feofanova, Elena V; Brown, Michael R; Alkis, Taryn; Manuel, Astrid M; Li, Xihao; Tahir, Usman A; Li, Zilin; Mendez, Kevin M; Kelly, Rachel S; Qi, Qibin; Chen, Han; Larson, Martin G; Lemaitre, Rozenn N; Morrison, Alanna C; Grieser, Charles; Wong, Kari E; Gerszten, Robert E; Zhao, Zhongming; Lasky-Su, Jessica; Yu, Bing.

Nat Commun ; 14(1): 6611, 2023 Oct 19.

Article En | MEDLINE | ID: mdl-37857625

7.

Whole Genome Sequencing Based Analysis of Inflammation Biomarkers in the Trans-Omics for Precision Medicine (TOPMed) Consortium.

Jiang, Min-Zhi; Gaynor, Sheila M; Li, Xihao; Van Buren, Eric; Stilp, Adrienne; Buth, Erin; Wang, Fei Fei; Manansala, Regina; Gogarten, Stephanie M; Li, Zilin; Polfus, Linda M; Salimi, Shabnam; Bis, Joshua C; Pankratz, Nathan; Yanek, Lisa R; Durda, Peter; Tracy, Russell P; Rich, Stephen S; Rotter, Jerome I; Mitchell, Braxton D; Lewis, Joshua P; Psaty, Bruce M; Pratte, Katherine A; Silverman, Edwin K; Kaplan, Robert C; Avery, Christy; North, Kari; Mathias, Rasika A; Faraday, Nauder; Lin, Honghuang; Wang, Biqi; Carson, April P; Norwood, Arnita F; Gibbs, Richard A; Kooperberg, Charles; Lundin, Jessica; Peters, Ulrike; Dupuis, Josée; Hou, Lifang; Fornage, Myriam; Benjamin, Emelia J; Reiner, Alexander P; Bowler, Russell P; Lin, Xihong; Auer, Paul L; Raffield, Laura M.

bioRxiv ; 2023 Sep 12.

Article En | MEDLINE | ID: mdl-37745480

Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38,465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program. We identified 22 distinct single-variant associations across 6 traits - E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin - that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.

8.

WHOLE GENOME SEQUENCING ANALYSIS OF BODY MASS INDEX IDENTIFIES NOVEL AFRICAN ANCESTRY-SPECIFIC RISK ALLELE.

Zhang, Xinruo; Brody, Jennifer A; Graff, Mariaelisa; Highland, Heather M; Chami, Nathalie; Xu, Hanfei; Wang, Zhe; Ferrier, Kendra; Chittoor, Geetha; Josyula, Navya S; Li, Xihao; Li, Zilin; Allison, Matthew A; Becker, Diane M; Bielak, Lawrence F; Bis, Joshua C; Boorgula, Meher Preethi; Bowden, Donald W; Broome, Jai G; Buth, Erin J; Carlson, Christopher S; Chang, Kyong-Mi; Chavan, Sameer; Chiu, Yen-Feng; Chuang, Lee-Ming; Conomos, Matthew P; DeMeo, Dawn L; Du, Margaret; Duggirala, Ravindranath; Eng, Celeste; Fohner, Alison E; Freedman, Barry I; Garrett, Melanie E; Guo, Xiuqing; Haiman, Chris; Heavner, Benjamin D; Hidalgo, Bertha; Hixson, James E; Ho, Yuk-Lam; Hobbs, Brian D; Hu, Donglei; Hui, Qin; Hwu, Chii-Min; Jackson, Rebecca D; Jain, Deepti; Kalyani, Rita R; Kardia, Sharon L R; Kelly, Tanika N; Lange, Ethan M; LeNoir, Michael.

medRxiv ; 2023 Aug 22.

Article En | MEDLINE | ID: mdl-37662265

Obesity is a major public health crisis associated with high mortality rates. Previous genome-wide association studies (GWAS) investigating body mass index (BMI) have largely relied on imputed data from European individuals. This study leveraged whole-genome sequencing (WGS) data from 88,873 participants from the Trans-Omics for Precision Medicine (TOPMed) Program, of which 51% were of non-European population groups. We discovered 18 BMI-associated signals (P < 5 × 10-9). Notably, we identified and replicated a novel low frequency single nucleotide polymorphism (SNP) in MTMR3 that was common in individuals of African descent. Using a diverse study population, we further identified two novel secondary signals in known BMI loci and pinpointed two likely causal variants in the POC5 and DMD loci. Our work demonstrates the benefits of combining WGS and diverse cohorts in expanding current catalog of variants and genes confer risk for obesity, bringing us one step closer to personalized medicine.

9.

Rare variants in long non-coding RNAs are associated with blood lipid levels in the TOPMed Whole Genome Sequencing Study.

Wang, Yuxuan; Selvaraj, Margaret Sunitha; Li, Xihao; Li, Zilin; Holdcraft, Jacob A; Arnett, Donna K; Bis, Joshua C; Blangero, John; Boerwinkle, Eric; Bowden, Donald W; Cade, Brian E; Carlson, Jenna C; Carson, April P; Chen, Yii-Der Ida; Curran, Joanne E; de Vries, Paul S; Dutcher, Susan K; Ellinor, Patrick T; Floyd, James S; Fornage, Myriam; Freedman, Barry I; Gabriel, Stacey; Germer, Soren; Gibbs, Richard A; Guo, Xiuqing; He, Jiang; Heard-Costa, Nancy; Hildalgo, Bertha; Hou, Lifang; Irvin, Marguerite R; Joehanes, Roby; Kaplan, Robert C; Kardia, Sharon Lr; Kelly, Tanika N; Kim, Ryan; Kooperberg, Charles; Kral, Brian G; Levy, Daniel; Li, Changwei; Liu, Chunyu; Lloyd-Jone, Don; Loos, Ruth Jf; Mahaney, Michael C; Martin, Lisa W; Mathias, Rasika A; Minster, Ryan L; Mitchell, Braxton D; Montasser, May E; Morrison, Alanna C; Murabito, Joanne M.

medRxiv ; 2023 Jun 29.

Article En | MEDLINE | ID: mdl-37425772

Long non-coding RNAs (lncRNAs) are known to perform important regulatory functions. Large-scale whole genome sequencing (WGS) studies and new statistical methods for variant set tests now provide an opportunity to assess the associations between rare variants in lncRNA genes and complex traits across the genome. In this study, we used high-coverage WGS from 66,329 participants of diverse ancestries with blood lipid levels (LDL-C, HDL-C, TC, and TG) in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) program to investigate the role of lncRNAs in lipid variability. We aggregated rare variants for 165,375 lncRNA genes based on their genomic locations and conducted rare variant aggregate association tests using the STAAR (variant-Set Test for Association using Annotation infoRmation) framework. We performed STAAR conditional analysis adjusting for common variants in known lipid GWAS loci and rare coding variants in nearby protein coding genes. Our analyses revealed 83 rare lncRNA variant sets significantly associated with blood lipid levels, all of which were located in known lipid GWAS loci (in a ±500 kb window of a Global Lipids Genetics Consortium index variant). Notably, 61 out of 83 signals (73%) were conditionally independent of common regulatory variations and rare protein coding variations at the same loci. We replicated 34 out of 61 (56%) conditionally independent associations using the independent UK Biobank WGS data. Our results expand the genetic architecture of blood lipids to rare variants in lncRNA, implicating new therapeutic opportunities.

10.

Whole-Genome Sequencing Analysis of Human Metabolome in Multi-Ethnic Populations.

Feofanova, Elena V; Brown, Michael R; Alkis, Taryn; Manuel, Astrid M; Li, Xihao; Tahir, Usman A; Li, Zilin; Mendez, Kevin M; Kelly, Rachel S; Qi, Qibin; Chen, Han; Larson, Martin G; Lemaitre, Rozenn N; Morrison, Alanna C; Grieser, Charles; Wong, Kari E; Gerszten, Robert E; Zhao, Zhongming; Lasky-Su, Jessica; Yu, Bing.

Nat Commun ; 14(1): 3111, 2023 05 30.

Article En | MEDLINE | ID: mdl-37253714

Circulating metabolite levels may reflect the state of the human organism in health and disease, however, the genetic architecture of metabolites is not fully understood. We have performed a whole-genome sequencing association analysis of both common and rare variants in up to 11,840 multi-ethnic participants from five studies with up to 1666 circulating metabolites. We have discovered 1985 novel variant-metabolite associations, and validated 761 locus-metabolite associations reported previously. Seventy-nine novel variant-metabolite associations have been replicated, including three genetic loci located on the X chromosome that have demonstrated its involvement in metabolic regulation. Gene-based analysis have provided further support for seven metabolite-replicated loci pairs and their biologically plausible genes. Among those novel replicated variant-metabolite pairs, follow-up analyses have revealed that 26 metabolites have colocalized with 21 tissues, seven metabolite-disease outcome associations have been putatively causal, and 7 metabolites might be regulated by plasma protein levels. Our results have depicted the genetic contribution to circulating metabolite levels, providing additional insights into understanding human disease.

Ethnicity , Quantitative Trait Loci , Humans , Ethnicity/genetics , Metabolome/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide

11.

Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies.

Li, Xihao; Quick, Corbin; Zhou, Hufeng; Gaynor, Sheila M; Liu, Yaowu; Chen, Han; Selvaraj, Margaret Sunitha; Sun, Ryan; Dey, Rounak; Arnett, Donna K; Bielak, Lawrence F; Bis, Joshua C; Blangero, John; Boerwinkle, Eric; Bowden, Donald W; Brody, Jennifer A; Cade, Brian E; Correa, Adolfo; Cupples, L Adrienne; Curran, Joanne E; de Vries, Paul S; Duggirala, Ravindranath; Freedman, Barry I; Göring, Harald H H; Guo, Xiuqing; Haessler, Jeffrey; Kalyani, Rita R; Kooperberg, Charles; Kral, Brian G; Lange, Leslie A; Manichaikul, Ani; Martin, Lisa W; McGarvey, Stephen T; Mitchell, Braxton D; Montasser, May E; Morrison, Alanna C; Naseri, Take; O'Connell, Jeffrey R; Palmer, Nicholette D; Peyser, Patricia A; Psaty, Bruce M; Raffield, Laura M; Redline, Susan; Reiner, Alexander P; Reupena, Muagututi'a Sefuiva; Rice, Kenneth M; Rich, Stephen S; Sitlani, Colleen M; Smith, Jennifer A; Taylor, Kent D.

Nat Genet ; 55(1): 154-164, 2023 01.

Article En | MEDLINE | ID: mdl-36564505

Meta-analysis of whole genome sequencing/whole exome sequencing (WGS/WES) studies provides an attractive solution to the problem of collecting large sample sizes for discovering rare variants associated with complex phenotypes. Existing rare variant meta-analysis approaches are not scalable to biobank-scale WGS data. Here we present MetaSTAAR, a powerful and resource-efficient rare variant meta-analysis framework for large-scale WGS/WES studies. MetaSTAAR accounts for relatedness and population structure, can analyze both quantitative and dichotomous traits and boosts the power of rare variant tests by incorporating multiple variant functional annotations. Through meta-analysis of four lipid traits in 30,138 ancestrally diverse samples from 14 studies of the Trans Omics for Precision Medicine (TOPMed) Program, we show that MetaSTAAR performs rare variant meta-analysis at scale and produces results comparable to using pooled data. Additionally, we identified several conditionally significant rare variant associations with lipid traits. We further demonstrate that MetaSTAAR is scalable to biobank-scale cohorts through meta-analysis of TOPMed WGS data and UK Biobank WES data of ~200,000 samples.

Genome-Wide Association Study , Lipids , Genome-Wide Association Study/methods , Whole Genome Sequencing/methods , Exome Sequencing , Phenotype , Lipids/genetics

12.

FAVOR: functional annotation of variants online resource and annotator for variation across the human genome.

Zhou, Hufeng; Arapoglou, Theodore; Li, Xihao; Li, Zilin; Zheng, Xiuwen; Moore, Jill; Asok, Abhijith; Kumar, Sushant; Blue, Elizabeth E; Buyske, Steven; Cox, Nancy; Felsenfeld, Adam; Gerstein, Mark; Kenny, Eimear; Li, Bingshan; Matise, Tara; Philippakis, Anthony; Rehm, Heidi L; Sofia, Heidi J; Snyder, Grace; Weng, Zhiping; Neale, Benjamin; Sunyaev, Shamil R; Lin, Xihong.

Nucleic Acids Res ; 51(D1): D1300-D1311, 2023 01 06.

Article En | MEDLINE | ID: mdl-36350676

Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.

Genome, Human , Software , Humans , Molecular Sequence Annotation , Genomics , Genotype , Genetic Variation

13.

Whole genome sequence analysis of blood lipid levels in >66,000 individuals.

Selvaraj, Margaret Sunitha; Li, Xihao; Li, Zilin; Pampana, Akhil; Zhang, David Y; Park, Joseph; Aslibekyan, Stella; Bis, Joshua C; Brody, Jennifer A; Cade, Brian E; Chuang, Lee-Ming; Chung, Ren-Hua; Curran, Joanne E; de Las Fuentes, Lisa; de Vries, Paul S; Duggirala, Ravindranath; Freedman, Barry I; Graff, Mariaelisa; Guo, Xiuqing; Heard-Costa, Nancy; Hidalgo, Bertha; Hwu, Chii-Min; Irvin, Marguerite R; Kelly, Tanika N; Kral, Brian G; Lange, Leslie; Li, Xiaohui; Lisa, Martin; Lubitz, Steven A; Manichaikul, Ani W; Michael, Preuss; Montasser, May E; Morrison, Alanna C; Naseri, Take; O'Connell, Jeffrey R; Palmer, Nicholette D; Peyser, Patricia A; Reupena, Muagututia S; Smith, Jennifer A; Sun, Xiao; Taylor, Kent D; Tracy, Russell P; Tsai, Michael Y; Wang, Zhe; Wang, Yuxuan; Bao, Wei; Wilkins, John T; Yanek, Lisa R; Zhao, Wei; Arnett, Donna K.

Nat Commun ; 13(1): 5995, 2022 10 11.

Article En | MEDLINE | ID: mdl-36220816

Blood lipids are heritable modifiable causal factors for coronary artery disease. Despite well-described monogenic and polygenic bases of dyslipidemia, limitations remain in discovery of lipid-associated alleles using whole genome sequencing (WGS), partly due to limited sample sizes, ancestral diversity, and interpretation of clinical significance. Among 66,329 ancestrally diverse (56% non-European) participants, we associate 428M variants from deep-coverage WGS with lipid levels; ~400M variants were not assessed in prior lipids genetic analyses. We find multiple lipid-related genes strongly associated with blood lipids through analysis of common and rare coding variants. We discover several associated rare non-coding variants, largely at Mendelian lipid genes. Notably, we observe rare LDLR intronic variants associated with markedly increased LDL-C, similar to rare LDLR exonic variants. In conclusion, we conducted a systematic whole genome scan for blood lipids expanding the alleles linked to lipids for multiple ancestries and characterize a clinically-relevant rare non-coding variant model for lipids.

Genome-Wide Association Study , Lipids , Alleles , Cholesterol, LDL , Humans , Whole Genome Sequencing

14.

A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies.

Li, Zilin; Li, Xihao; Zhou, Hufeng; Gaynor, Sheila M; Selvaraj, Margaret Sunitha; Arapoglou, Theodore; Quick, Corbin; Liu, Yaowu; Chen, Han; Sun, Ryan; Dey, Rounak; Arnett, Donna K; Auer, Paul L; Bielak, Lawrence F; Bis, Joshua C; Blackwell, Thomas W; Blangero, John; Boerwinkle, Eric; Bowden, Donald W; Brody, Jennifer A; Cade, Brian E; Conomos, Matthew P; Correa, Adolfo; Cupples, L Adrienne; Curran, Joanne E; de Vries, Paul S; Duggirala, Ravindranath; Franceschini, Nora; Freedman, Barry I; Göring, Harald H H; Guo, Xiuqing; Kalyani, Rita R; Kooperberg, Charles; Kral, Brian G; Lange, Leslie A; Lin, Bridget M; Manichaikul, Ani; Manning, Alisa K; Martin, Lisa W; Mathias, Rasika A; Meigs, James B; Mitchell, Braxton D; Montasser, May E; Morrison, Alanna C; Naseri, Take; O'Connell, Jeffrey R; Palmer, Nicholette D; Peyser, Patricia A; Psaty, Bruce M; Raffield, Laura M.

Nat Methods ; 19(12): 1599-1611, 2022 12.

Article En | MEDLINE | ID: mdl-36303018

Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare-variant (RV) associations with complex human diseases and traits. Variant-set analysis is a powerful approach to study RV association. However, existing methods have limited ability in analyzing the noncoding genome. We propose a computationally efficient and robust noncoding RV association detection framework, STAARpipeline, to automatically annotate a whole-genome sequencing study and perform flexible noncoding RV association analysis, including gene-centric analysis and fixed window-based and dynamic window-based non-gene-centric analysis by incorporating variant functional annotations. In gene-centric analysis, STAARpipeline uses STAAR to group noncoding variants based on functional categories of genes and incorporate multiple functional annotations. In non-gene-centric analysis, STAARpipeline uses SCANG-STAAR to incorporate dynamic window sizes and multiple functional annotations. We apply STAARpipeline to identify noncoding RV sets associated with four lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several of them in an additional 9,123 TOPMed samples. We also analyze five non-lipid TOPMed traits.

Genome-Wide Association Study , Genome , Humans , Genome-Wide Association Study/methods , Whole Genome Sequencing/methods , Phenotype , Genetic Variation

15.

Cross-ancestry genome-wide meta-analysis of 61,047 cases and 947,237 controls identifies new susceptibility loci contributing to lung cancer.

Byun, Jinyoung; Han, Younghun; Li, Yafang; Xia, Jun; Long, Erping; Choi, Jiyeon; Xiao, Xiangjun; Zhu, Meng; Zhou, Wen; Sun, Ryan; Bossé, Yohan; Song, Zhuoyi; Schwartz, Ann; Lusk, Christine; Rafnar, Thorunn; Stefansson, Kari; Zhang, Tongwu; Zhao, Wei; Pettit, Rowland W; Liu, Yanhong; Li, Xihao; Zhou, Hufeng; Walsh, Kyle M; Gorlov, Ivan; Gorlova, Olga; Zhu, Dakai; Rosenberg, Susan M; Pinney, Susan; Bailey-Wilson, Joan E; Mandal, Diptasri; de Andrade, Mariza; Gaba, Colette; Willey, James C; You, Ming; Anderson, Marshall; Wiencke, John K; Albanes, Demetrius; Lam, Stephan; Tardon, Adonina; Chen, Chu; Goodman, Gary; Bojeson, Stig; Brenner, Hermann; Landi, Maria Teresa; Chanock, Stephen J; Johansson, Mattias; Muley, Thomas; Risch, Angela; Wichmann, H-Erich; Bickeböller, Heike.

Nat Genet ; 54(8): 1167-1177, 2022 08.

Article En | MEDLINE | ID: mdl-35915169

To identify new susceptibility loci to lung cancer among diverse populations, we performed cross-ancestry genome-wide association studies in European, East Asian and African populations and discovered five loci that have not been previously reported. We replicated 26 signals and identified 10 new lead associations from previously reported loci. Rare-variant associations tended to be specific to populations, but even common-variant associations influencing smoking behavior, such as those with CHRNA5 and CYP2A6, showed population specificity. Fine-mapping and expression quantitative trait locus colocalization nominated several candidate variants and susceptibility genes such as IRF4 and FUBP1. DNA damage assays of prioritized genes in lung fibroblasts indicated that a subset of these genes, including the pleiotropic gene IRF4, potentially exert effects by promoting endogenous DNA damage.

Genome-Wide Association Study , Lung Neoplasms , DNA-Binding Proteins/genetics , Genetic Predisposition to Disease , Humans , Lung Neoplasms/genetics , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , RNA-Binding Proteins/genetics

16.

STAAR workflow: a cloud-based workflow for scalable and reproducible rare variant analysis.

Gaynor, Sheila M; Westerman, Kenneth E; Ackovic, Lea L; Li, Xihao; Li, Zilin; Manning, Alisa K; Philippakis, Anthony; Lin, Xihong.

Bioinformatics ; 38(11): 3116-3117, 2022 05 26.

Article En | MEDLINE | ID: mdl-35441669

SUMMARY: We developed the variant-Set Test for Association using Annotation infoRmation (STAAR) workflow description language (WDL) workflow to facilitate the analysis of rare variants in whole genome sequencing association studies. The open-access STAAR workflow written in the WDL allows a user to perform rare variant testing for both gene-centric and genetic region approaches, enabling genome-wide, candidate and conditional analyses. It incorporates functional annotations into the workflow as introduced in the STAAR method in order to boost the rare variant analysis power. This tool was specifically developed and optimized to be implemented on cloud-based platforms such as BioData Catalyst Powered by Terra. It provides easy-to-use functionality for rare variant analysis that can be incorporated into an exhaustive whole genome sequencing analysis pipeline. AVAILABILITY AND IMPLEMENTATION: The workflow is freely available from https://dockstore.org/workflows/github.com/sheilagaynor/STAAR_workflow. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Cloud Computing , Software , Workflow , Genome , Genome-Wide Association Study

17.

A multi-dimensional integrative scoring framework for predicting functional variants in the human genome.

Li, Xihao; Yung, Godwin; Zhou, Hufeng; Sun, Ryan; Li, Zilin; Hou, Kangcheng; Zhang, Martin Jinye; Liu, Yaowu; Arapoglou, Theodore; Wang, Chen; Ionita-Laza, Iuliana; Lin, Xihong.

Am J Hum Genet ; 109(3): 446-456, 2022 03 03.

Article En | MEDLINE | ID: mdl-35216679

Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.

Genome, Human , Genome-Wide Association Study , Genome, Human/genetics , Genome-Wide Association Study/methods , Genomics , Humans , Molecular Sequence Annotation , Polymorphism, Single Nucleotide/genetics , Probability

18.

Spatiotemporal patterns of neuronal subtype genesis suggest hierarchical development of retinal diversity.

West, Emma R; Lapan, Sylvain W; Lee, ChangHee; Kajderowicz, Kathrin M; Li, Xihao; Cepko, Constance L.

Cell Rep ; 38(1): 110191, 2022 01 04.

Article En | MEDLINE | ID: mdl-34986354

How do neuronal subtypes emerge during development? Recent molecular studies have profiled existing neuronal diversity, but neuronal subtype genesis remains elusive. The 15 types of mouse retinal bipolar interneurons are characterized by distinct functions, morphologies, and transcriptional profiles. Here, we develop a comprehensive spatiotemporal map of bipolar subtype genesis in the murine retina. Combining multiplexed detection of 16 RNA markers with timed delivery of 5-ethynyl uridine (EdU) and bromodeoxyuridine (BrdU), we analyze more than 30,000 single cells in full retinal sections to classify all bipolar subtypes and their birthdates. We find that bipolar subtype birthdates are ordered and follow a centrifugal developmental axis. Spatial analysis reveals a striking wave pattern of bipolar subtype birthdates, and lineage analyses suggest clonal restriction on homotypic subtype production. These results inspire a hierarchical developmental model, with ordered subtype genesis within lineages. Our results provide insight into neuronal subtype development and establish a framework for studying subtype diversification.

Cell Lineage/physiology , Neurogenesis/physiology , Retinal Bipolar Cells/cytology , Spatio-Temporal Analysis , Animals , Female , Gene Expression Regulation, Developmental/genetics , Male , Mice , Mice, Inbred C57BL , RNA/genetics , Retina/cytology , Retina/metabolism , Retinal Bipolar Cells/metabolism

19.

Surgical Survival Benefits With Different Metastatic Patterns for Stage IV Extrathoracic Metastatic Non-Small Cell Lung Cancer: A SEER-Based Study.

Chao, Ce; Qian, Yongxiang; Li, Xihao; Sang, Chen; Wang, Bin; Zhang, Xiao-Ying.

Technol Cancer Res Treat ; 20: 15330338211033064, 2021.

Article En | MEDLINE | ID: mdl-34496678

BACKGROUND: With the knowledge of oligometastases, primary surgery plays an increasingly vital role in metastatic non-small cell lung cancer. We aimed to evaluate the survival benefit of primary surgery based on metastatic patterns. MATERIALS AND METHODS: The selected patients with stage IV extrathoracic metastatic (m1b) non-small cell lung cancer between 2010 and 2015 were included in a retrospective cohort study from the Surveillance, Epidemiology, and End Results (SEER) database. Multiple imputation was used for the missing data. Patients were divided into 2 groups depending on whether surgery was performed. After covariate balancing propensity score (CBPS) weighting, multivariate Cox regression models and Kaplan-Meier survival curve were built to identify the survival benefit of different metastatic patterns. RESULTS: Surgery can potentially increase the overall survival (OS) (adjusted HR: 0.68, P < 0.001) of non-small cell lung cancer. The weighted 3-year OS in the surgical group was 16.9%, compared with 7.8% in the nonsurgical group. For single organ metastasis, surgery could improve the survival of metastatic non-small cell lung cancer. Meanwhile, no significant survival improvements in surgical group were observed in patients with multiple organ metastases. CONCLUSION: The surgical survival benefits for extrathoracic metastatic non-small cell lung cancer could be divided by metastatic pattern.

Carcinoma, Non-Small-Cell Lung/epidemiology , Lung Neoplasms/epidemiology , Adolescent , Adult , Aged , Aged, 80 and over , Carcinoma, Non-Small-Cell Lung/pathology , Carcinoma, Non-Small-Cell Lung/surgery , Clinical Decision-Making , Disease Management , Female , Humans , Kaplan-Meier Estimate , Lung Neoplasms/pathology , Lung Neoplasms/surgery , Male , Middle Aged , Neoplasm Metastasis , Neoplasm Staging , Prognosis , SEER Program , Treatment Outcome , United States/epidemiology , Young Adult

20.

Association between Smoking History and Tumor Mutation Burden in Advanced Non-Small Cell Lung Cancer.

Wang, Xinan; Ricciuti, Biagio; Nguyen, Tom; Li, Xihao; Rabin, Michael S; Awad, Mark M; Lin, Xihong; Johnson, Bruce E; Christiani, David C.

Cancer Res ; 81(9): 2566-2573, 2021 05 01.

Article En | MEDLINE | ID: mdl-33653773

Lung carcinogenesis is a complex and stepwise process involving accumulation of genetic mutations in signaling and oncogenic pathways via interactions with environmental factors and host susceptibility. Tobacco exposure is the leading cause of lung cancer, but its relationship to clinically relevant mutations and the composite tumor mutation burden (TMB) has not been fully elucidated. In this study, we investigated the dose-response relationship in a retrospective observational study of 931 patients treated for advanced-stage non-small cell lung cancer (NSCLC) between April 2013 and February 2020 at the Dana Farber Cancer Institute and Brigham and Women's Hospital. Doubling smoking pack-years was associated with increased KRASG12C and less frequent EGFRdel19 and EGFRL858R mutations, whereas doubling smoking-free months was associated with more frequent EGFRL858R . In advanced lung adenocarcinoma, doubling smoking pack-years was associated with an increase in TMB, whereas doubling smoking-free months was associated with a decrease in TMB, after controlling for age, gender, and stage. There is a significant dose-response association of smoking history with genetic alterations in cancer-related pathways and TMB in advanced lung adenocarcinoma. SIGNIFICANCE: This study clarifies the relationship between smoking history and clinically relevant mutations in non-small cell lung cancer, revealing the potential of smoking history as a surrogate for tumor mutation burden.

Adenocarcinoma of Lung/genetics , Carcinoma, Non-Small-Cell Lung/genetics , Lung Neoplasms/genetics , Mutation , Smoking/adverse effects , Aged , Biomarkers, Tumor/genetics , Carcinogenesis/genetics , ErbB Receptors/genetics , Female , Humans , Male , Middle Aged , Oncogenes , Prospective Studies , Proto-Oncogene Proteins p21(ras)/genetics , Retrospective Studies