Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 128
Filter
1.
medRxiv ; 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38699360

ABSTRACT

Mosaic loss of Y (mLOY) is the most common somatic chromosomal alteration detected in human blood. The presence of mLOY is associated with altered blood cell counts and increased risk of Alzheimer's disease, solid tumors, and other age-related diseases. We sought to gain a better understanding of genetic drivers and associated phenotypes of mLOY through analyses of whole genome sequencing of a large set of genetically diverse males from the Trans-Omics for Precision Medicine (TOPMed) program. This approach enabled us to identify differences in mLOY frequencies across populations defined by genetic similarity, revealing a higher frequency of mLOY in the European American (EA) ancestry group compared to those of Hispanic American (HA), African American (AA), and East Asian (EAS) ancestry. Further, we identified two genes ( CFHR1 and LRP6 ) that harbor multiple rare, putatively deleterious variants associated with mLOY susceptibility, show that subsets of human hematopoietic stem cells are enriched for activity of mLOY susceptibility variants, and that certain alleles on chromosome Y are more likely to be lost than others.

2.
Arch Pathol Lab Med ; 2024 May 27.
Article in English | MEDLINE | ID: mdl-38797720

ABSTRACT

CONTEXT.­: The National Institutes of Health Genotype-Tissue Expression (GTEx) project was developed to elucidate how genetic variation influences gene expression in multiple normal tissues procured from postmortem donors. OBJECTIVE.­: To provide critical insight into a biospecimen's suitability for subsequent analysis, each biospecimen underwent quality assessment measures that included evaluation for underlying disease and potential effects introduced by preanalytic factors. DESIGN.­: Electronic images of each tissue collected from nearly 1000 postmortem donors were evaluated by board-certified pathologists for the extent of autolysis, tissue purity, and the type and abundance of any extraneous tissue. Tissue-specific differences in the severity of autolysis and RNA integrity were evaluated, as were potential relationships between these markers and the duration of postmortem interval and rapidity of death. RESULTS.­: Tissue-specific challenges in the procurement and preservation of the nearly 30 000 tissue specimens collected during the GTEx project are summarized. Differences in the degree of autolysis and RNA integrity number were observed among the 40 tissue types evaluated, and tissue-specific susceptibilities to the duration of postmortem interval and rapidity of death were observed. CONCLUSIONS.­: Ninety-five percent of tissues were of sufficient quality to support RNA sequencing analysis. Biospecimens, annotated whole slide images, de-identified clinical data, and genomic data generated for GTEx represent a high-quality and comprehensive resource for the scientific community that has contributed to its use in approximately 1695 articles. Biospecimens and data collected under the GTEx project are available via the GTEx portal and authorized access to the Database of Genotypes and Phenotypes; procedures and whole slide images are available from the National Cancer Institute.

3.
Nat Med ; 2024 Apr 16.
Article in English | MEDLINE | ID: mdl-38627562

ABSTRACT

Reduced insulin sensitivity (insulin resistance) is a hallmark of normal physiology in late pregnancy and also underlies gestational diabetes mellitus (GDM). We conducted transcriptomic profiling of 434 human placentas and identified a positive association between insulin-like growth factor binding protein 1 gene (IGFBP1) expression in the placenta and insulin sensitivity at ~26 weeks gestation. Circulating IGFBP1 protein levels rose over the course of pregnancy and declined postpartum, which, together with high gene expression levels in our placenta samples, suggests a placental or decidual source. Higher circulating IGFBP1 levels were associated with greater insulin sensitivity (lesser insulin resistance) at ~26 weeks gestation in the same cohort and in two additional pregnancy cohorts. In addition, low circulating IGFBP1 levels in early pregnancy predicted subsequent GDM diagnosis in two cohorts of pregnant women. These results implicate IGFBP1 in the glycemic physiology of pregnancy and suggest a role for placental IGFBP1 deficiency in GDM pathogenesis.

4.
Nat Rev Genet ; 2024 Mar 28.
Article in English | MEDLINE | ID: mdl-38548833

ABSTRACT

Germline variation and somatic mutation are intricately connected and together shape human traits and disease risks. Germline variants are present from conception, but they vary between individuals and accumulate over generations. By contrast, somatic mutations accumulate throughout life in a mosaic manner within an individual due to intrinsic and extrinsic sources of mutations and selection pressures acting on cells. Recent advancements, such as improved detection methods and increased resources for association studies, have drastically expanded our ability to investigate germline and somatic genetic variation and compare underlying mutational processes. A better understanding of the similarities and differences in the types, rates and patterns of germline and somatic variants, as well as their interplay, will help elucidate the mechanisms underlying their distinct yet interlinked roles in human health and biology.

5.
Ann Am Thorac Soc ; 21(6): 884-894, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38335160

ABSTRACT

Rationale: Chronic obstructive pulmonary disease (COPD) and emphysema are associated with endothelial damage and altered pulmonary microvascular perfusion. The molecular mechanisms underlying these changes are poorly understood in patients, in part because of the inaccessibility of the pulmonary vasculature. Peripheral blood mononuclear cells (PBMCs) interact with the pulmonary endothelium. Objectives: To test the association between gene expression in PBMCs and pulmonary microvascular perfusion in COPD. Methods: The Multi-Ethnic Study of Atherosclerosis (MESA) COPD Study recruited two independent samples of COPD cases and controls with ⩾10 pack-years of smoking history. In both samples, pulmonary microvascular blood flow, pulmonary microvascular blood volume, and mean transit time were assessed on contrast-enhanced magnetic resonance imaging, and PBMC gene expression was assessed by microarray. Additional replication was performed in a third sample with pulmonary microvascular blood volume measures on contrast-enhanced dual-energy computed tomography. Differential expression analyses were adjusted for age, gender, race/ethnicity, educational attainment, height, weight, smoking status, and pack-years of smoking. Results: The 79 participants in the discovery sample had a mean age of 69 ± 6 years, 44% were female, 25% were non-White, 34% were current smokers, and 66% had COPD. There were large PBMC gene expression signatures associated with pulmonary microvascular perfusion traits, with several replicated in the replication sets with magnetic resonance imaging (n = 47) or dual-energy contrast-enhanced computed tomography (n = 157) measures. Many of the identified genes are involved in inflammatory processes, including nuclear factor-κB and chemokine signaling pathways. Conclusions: PBMC gene expression in nuclear factor-κB, inflammatory, and chemokine signaling pathways was associated with pulmonary microvascular perfusion in COPD, potentially offering new targetable candidates for novel therapies.


Subject(s)
Leukocytes, Mononuclear , Magnetic Resonance Imaging , Pulmonary Disease, Chronic Obstructive , Humans , Female , Male , Aged , Leukocytes, Mononuclear/metabolism , Pulmonary Disease, Chronic Obstructive/genetics , Pulmonary Disease, Chronic Obstructive/physiopathology , Middle Aged , Lung/blood supply , Lung/diagnostic imaging , Lung/metabolism , Atherosclerosis/genetics , Atherosclerosis/ethnology , Case-Control Studies , United States/epidemiology , Aged, 80 and over , Gene Expression , Tomography, X-Ray Computed , Pulmonary Circulation , Smoking , Microcirculation
6.
Am J Hum Genet ; 111(3): 445-455, 2024 Mar 07.
Article in English | MEDLINE | ID: mdl-38320554

ABSTRACT

Regulation of transcription and translation are mechanisms through which genetic variants affect complex traits. Expression quantitative trait locus (eQTL) studies have been more successful at identifying cis-eQTL (within 1 Mb of the transcription start site) than trans-eQTL. Here, we tested the cis component of gene expression for association with observed plasma protein levels to identify cis- and trans-acting genes that regulate protein levels. We used transcriptome prediction models from 49 Genotype-Tissue Expression (GTEx) Project tissues to predict the cis component of gene expression and tested the predicted expression of every gene in every tissue for association with the observed abundance of 3,622 plasma proteins measured in 3,301 individuals from the INTERVAL study. We tested significant results for replication in 971 individuals from the Trans-omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA). We found 1,168 and 1,210 cis- and trans-acting associations that replicated in TOPMed (FDR < 0.05) with a median expected true positive rate (π1) across tissues of 0.806 and 0.390, respectively. The target proteins of trans-acting genes were enriched for transcription factor binding sites and autoimmune diseases in the GWAS catalog. Furthermore, we found a higher correlation between predicted expression and protein levels of the same underlying gene (R = 0.17) than observed expression (R = 0.10, p = 7.50 × 10-11). This indicates the cis-acting genetically regulated (heritable) component of gene expression is more consistent across tissues than total observed expression (genetics + environment) and is useful in uncovering the function of SNPs associated with complex traits.


Subject(s)
Proteome , Transcriptome , Humans , Transcriptome/genetics , Proteome/genetics , Multifactorial Inheritance , Quantitative Trait Loci/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide/genetics
7.
Am J Hum Genet ; 111(1): 133-149, 2024 Jan 04.
Article in English | MEDLINE | ID: mdl-38181730

ABSTRACT

Bulk-tissue molecular quantitative trait loci (QTLs) have been the starting point for interpreting disease-associated variants, and context-specific QTLs show particular relevance for disease. Here, we present the results of mapping interaction QTLs (iQTLs) for cell type, age, and other phenotypic variables in multi-omic, longitudinal data from the blood of individuals of diverse ancestries. By modeling the interaction between genotype and estimated cell-type proportions, we demonstrate that cell-type iQTLs could be considered as proxies for cell-type-specific QTL effects, particularly for the most abundant cell type in the tissue. The interpretation of age iQTLs, however, warrants caution because the moderation effect of age on the genotype and molecular phenotype association could be mediated by changes in cell-type composition. Finally, we show that cell-type iQTLs contribute to cell-type-specific enrichment of diseases that, in combination with additional functional data, could guide future functional studies. Overall, this study highlights the use of iQTLs to gain insights into the context specificity of regulatory effects.


Subject(s)
Gene Expression Regulation , Quantitative Trait Loci , Humans , Quantitative Trait Loci/genetics , Genotype , Phenotype
8.
J Clin Endocrinol Metab ; 109(3): e1159-e1166, 2024 Feb 20.
Article in English | MEDLINE | ID: mdl-37864851

ABSTRACT

CONTEXT: Elevated body mass index (BMI) in pregnancy is associated with adverse maternal and fetal outcomes. The placental transcriptome may elucidate molecular mechanisms underlying these associations. OBJECTIVE: We examined the association of first-trimester maternal BMI with the placental transcriptome in the Gen3G prospective cohort. METHODS: We enrolled participants at 5 to 16 weeks of gestation and measured height and weight. We collected placenta samples at delivery. We performed whole-genome RNA sequencing using Illumina HiSeq 4000 and aligned RNA sequences based on the GTEx v8 pipeline. We conducted differential gene expression analysis of over 15 000 genes from 450 placental samples and reported the change in normalized gene expression per 1-unit increase in log2 BMI (kg/m2) as a continuous variable using Limma Voom. We adjusted models for maternal age, fetal sex, gestational age at delivery, gravidity, and surrogate variables accounting for technical variability. We compared participants with BMI of 18.5 to 24.9 mg/kg2 (N = 257) vs those with obesity (BMI ≥30 kg/m2, N = 82) in secondary analyses. RESULTS: Participants' mean ± SD age was 28.2 ± 4.4 years and BMI was 25.4 ± 5.5 kg/m2 in early pregnancy. Higher maternal BMI was associated with lower placental expression of EPYC (slope = -1.94, false discovery rate [FDR]-adjusted P = 7.3 × 10-6 for continuous BMI; log2 fold change = -1.35, FDR-adjusted P = 3.4 × 10-3 for BMI ≥30 vs BMI 18.5-24.9 kg/m2) and with higher placental expression of IGFBP6, CHRDL1, and CXCL13 after adjustment for covariates and accounting for multiple testing (FDR < 0.05). CONCLUSION: Our genome-wide transcriptomic study revealed novel genes potentially implicated in placental biologic response to higher maternal BMI in early pregnancy.


Subject(s)
Placenta , Transcriptome , Pregnancy , Humans , Female , Young Adult , Adult , Body Mass Index , Placenta/metabolism , Prospective Studies , Gene Expression Profiling
9.
Res Sq ; 2023 Oct 27.
Article in English | MEDLINE | ID: mdl-37961187

ABSTRACT

Reduced insulin sensitivity (or greater insulin resistance) is a hallmark of normal physiology in late pregnancy and also underlies gestational diabetes mellitus (GDM) pathophysiology. We conducted transcriptomic profiling of 434 human placentas and identified a strong positive association between insulin-like growth factor binding protein 1 gene (IGFBP1) expression in the placenta and insulin sensitivity at ~ 26 weeks' gestation. Circulating IGFBP1 protein levels rose over the course of pregnancy and declined postpartum, which together with high placental gene expression levels, suggests a placental source. Higher circulating IGFBP1 levels were strongly associated with greater insulin sensitivity (lesser insulin resistance) at ~ 26 weeks' gestation in the same cohort and two additional pregnancy cohorts. In addition, low circulating IGFBP1 levels in early pregnancy predicted subsequent GDM diagnosis in two cohorts. These results implicate IGFBP1 in the glycemic physiology of pregnancy and suggest a role for placental IGFBP1 deficiency in GDM pathogenesis.

10.
Cell Genom ; 3(10): 100401, 2023 Oct 11.
Article in English | MEDLINE | ID: mdl-37868038

ABSTRACT

Each human genome has tens of thousands of rare genetic variants; however, identifying impactful rare variants remains a major challenge. We demonstrate how use of personal multi-omics can enable identification of impactful rare variants by using the Multi-Ethnic Study of Atherosclerosis, which included several hundred individuals, with whole-genome sequencing, transcriptomes, methylomes, and proteomes collected across two time points, 10 years apart. We evaluated each multi-omics phenotype's ability to separately and jointly inform functional rare variation. By combining expression and protein data, we observed rare stop variants 62 times and rare frameshift variants 216 times as frequently as controls, compared to 13-27 times as frequently for expression or protein effects alone. We extended a Bayesian hierarchical model, "Watershed," to prioritize specific rare variants underlying multi-omics signals across the regulatory cascade. With this approach, we identified rare variants that exhibited large effect sizes on multiple complex traits including height, schizophrenia, and Alzheimer's disease.

11.
HGG Adv ; 4(4): 100216, 2023 Oct 12.
Article in English | MEDLINE | ID: mdl-37869564

ABSTRACT

Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized that methods that leverage shared regulatory effects across different conditions, in this case, across different populations, may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWASs) using different methods (elastic net, joint-tissue imputation [JTI], matrix expression quantitative trait loci [Matrix eQTL], multivariate adaptive shrinkage in R [MASHR], and transcriptome-integrated genetic association resource [TIGAR]) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWASs, we integrated publicly available multiethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study and Pan-ancestry genetic analysis of the UK Biobank (PanUKBB) with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multiethnic TWASs, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWASs and loci previously not found in GWASs. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWASs for multiethnic or underrepresented populations.


Subject(s)
Genome-Wide Association Study , Transcriptome , Humans , Transcriptome/genetics , Quantitative Trait Loci/genetics , Gene Frequency , Linkage Disequilibrium
12.
bioRxiv ; 2023 Aug 21.
Article in English | MEDLINE | ID: mdl-37662416

ABSTRACT

Blood lipid traits are treatable and heritable risk factors for heart disease, a leading cause of mortality worldwide. Although genome-wide association studies (GWAS) have discovered hundreds of variants associated with lipids in humans, most of the causal mechanisms of lipids remain unknown. To better understand the biological processes underlying lipid metabolism, we investigated the associations of plasma protein levels with total cholesterol (TC), triglycerides (TG), high-density lipoprotein cholesterol (HDL), and low-density lipoprotein cholesterol (LDL) in blood. We trained protein prediction models based on samples in the Multi-Ethnic Study of Atherosclerosis (MESA) and applied them to conduct proteome-wide association studies (PWAS) for lipids using the Global Lipids Genetics Consortium (GLGC) data. Of the 749 proteins tested, 42 were significantly associated with at least one lipid trait. Furthermore, we performed transcriptome-wide association studies (TWAS) for lipids using 9,714 gene expression prediction models trained on samples from peripheral blood mononuclear cells (PBMCs) in MESA and 49 tissues in the Genotype-Tissue Expression (GTEx) project. We found that although PWAS and TWAS can show different directions of associations in an individual gene, 40 out of 49 tissues showed a positive correlation between PWAS and TWAS signed p-values across all the genes, which suggests a high-level consistency between proteome-lipid associations and transcriptome-lipid associations.

13.
Nat Genet ; 55(10): 1665-1676, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37770633

ABSTRACT

Genetic variants associated with complex traits are primarily noncoding, and their effects on gene-regulatory activity remain largely uncharacterized. To address this, we profile epigenomic variation of histone mark H3K27ac across 387 brain, heart, muscle and lung samples from Genotype-Tissue Expression (GTEx). We annotate 282 k active regulatory elements (AREs) with tissue-specific activity patterns. We identify 2,436 sex-biased AREs and 5,397 genetically influenced AREs associated with 130 k genetic variants (haQTLs) across tissues. We integrate genetic and epigenomic variation to provide mechanistic insights for disease-associated loci from 55 genome-wide association studies (GWAS), by revealing candidate tissues of action, driver SNPs and impacted AREs. Lastly, we build ARE-gene linking scores based on genetics (gLink scores) and demonstrate their unique ability to prioritize SNP-ARE-gene circuits. Overall, our epigenomic datasets, computational integration and mechanistic predictions provide valuable resources and important insights for understanding the molecular basis of human diseases/traits such as schizophrenia.


Subject(s)
Epigenomics , Genome-Wide Association Study , Humans , Quantitative Trait Loci/genetics , Genotype , Gene Regulatory Networks , Polymorphism, Single Nucleotide/genetics , Genetic Predisposition to Disease
14.
Cell Genom ; 3(8): 100359, 2023 Aug 09.
Article in English | MEDLINE | ID: mdl-37601969

ABSTRACT

Multi-omics datasets are becoming more common, necessitating better integration methods to realize their revolutionary potential. Here, we introduce multi-set correlation and factor analysis (MCFA), an unsupervised integration method tailored to the unique challenges of high-dimensional genomics data that enables fast inference of shared and private factors. We used MCFA to integrate methylation markers, protein expression, RNA expression, and metabolite levels in 614 diverse samples from the Trans-Omics for Precision Medicine/Multi-Ethnic Study of Atherosclerosis multi-omics pilot. Samples cluster strongly by ancestry in the shared space, even in the absence of genetic information, while private spaces frequently capture dataset-specific technical variation. Finally, we integrated genetic data by conducting a genome-wide association study (GWAS) of our inferred factors, observing that several factors are enriched for GWAS hits and trans-expression quantitative trait loci. Two of these factors appear to be related to metabolic disease. Our study provides a foundation and framework for further integrative analysis of ever larger multi-modal genomic datasets.

15.
bioRxiv ; 2023 Jun 29.
Article in English | MEDLINE | ID: mdl-37425716

ABSTRACT

Bulk tissue molecular quantitative trait loci (QTLs) have been the starting point for interpreting disease-associated variants, while context-specific QTLs show particular relevance for disease. Here, we present the results of mapping interaction QTLs (iQTLs) for cell type, age, and other phenotypic variables in multi-omic, longitudinal data from blood of individuals of diverse ancestries. By modeling the interaction between genotype and estimated cell type proportions, we demonstrate that cell type iQTLs could be considered as proxies for cell type-specific QTL effects. The interpretation of age iQTLs, however, warrants caution as the moderation effect of age on the genotype and molecular phenotype association may be mediated by changes in cell type composition. Finally, we show that cell type iQTLs contribute to cell type-specific enrichment of diseases that, in combination with additional functional data, may guide future functional studies. Overall, this study highlights iQTLs to gain insights into the context-specificity of regulatory effects.

16.
Nat Genet ; 55(8): 1267-1276, 2023 08.
Article in English | MEDLINE | ID: mdl-37443254

ABSTRACT

Genome-wide association studies (GWASs) are a valuable tool for understanding the biology of complex human traits and diseases, but associated variants rarely point directly to causal genes. In the present study, we introduce a new method, polygenic priority score (PoPS), that learns trait-relevant gene features, such as cell-type-specific expression, to prioritize genes at GWAS loci. Using a large evaluation set of genes with fine-mapped coding variants, we show that PoPS and the closest gene individually outperform other gene prioritization methods, but observe the best overall performance by combining PoPS with orthogonal methods. Using this combined approach, we prioritize 10,642 unique gene-trait pairs across 113 complex traits and diseases with high precision, finding not only well-established gene-trait relationships but nominating new genes at unresolved loci, such as LGR4 for estimated glomerular filtration rate and CCR7 for deep vein thrombosis. Overall, we demonstrate that PoPS provides a powerful addition to the gene prioritization toolbox.


Subject(s)
Multifactorial Inheritance , Quantitative Trait Loci , Humans , Multifactorial Inheritance/genetics , Quantitative Trait Loci/genetics , Genome-Wide Association Study/methods , Genetic Predisposition to Disease/genetics , Phenotype , Polymorphism, Single Nucleotide/genetics
17.
J Clin Invest ; 133(18)2023 09 15.
Article in English | MEDLINE | ID: mdl-37498674

ABSTRACT

Clonal hematopoiesis of indeterminate potential (CHIP) is associated with an increased risk of cardiovascular diseases (CVDs), putatively via inflammasome activation. We pursued an inflammatory gene modifier scan for CHIP-associated CVD risk among 424,651 UK Biobank participants. We identified CHIP using whole-exome sequencing data of blood DNA and modeled as a composite, considering all driver genes together, as well as separately for common drivers (DNMT3A, TET2, ASXL1, and JAK2). We developed predicted gene expression scores for 26 inflammasome-related genes and assessed how they modify CHIP-associated CVD risk. We identified IL1RAP as a potential key molecule for CHIP-associated CVD risk across genes and increased AIM2 gene expression leading to heightened JAK2- and ASXL1-associated CVD risk. We show that CRISPR-induced Asxl1-mutated murine macrophages had a particularly heightened inflammatory response to AIM2 agonism, associated with an increased DNA damage response, as well as increased IL-10 secretion, mirroring a CVD-protective effect of IL10 expression in ASXL1 CHIP. Our study supports the role of inflammasomes in CHIP-associated CVD and provides evidence to support gene-specific strategies to address CHIP-associated CVD risk.


Subject(s)
Cardiovascular Diseases , Humans , Animals , Mice , Cardiovascular Diseases/genetics , Clonal Hematopoiesis/genetics , Risk Factors , Inflammasomes/genetics , Hematopoiesis/genetics , Inflammation/genetics , Inflammation/complications , Heart Disease Risk Factors , Mutation
18.
PLoS Genet ; 19(5): e1010517, 2023 05.
Article in English | MEDLINE | ID: mdl-37216410

ABSTRACT

Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features-referred to as canonical variables (CVs)-within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits.


Subject(s)
Canonical Correlation Analysis , Proteomics , Humans , Proteomics/methods , Multiomics , Cohort Studies
19.
Science ; 380(6641): eabn7113, 2023 04 14.
Article in English | MEDLINE | ID: mdl-37053313

ABSTRACT

Postzygotic mutations (PZMs) begin to accrue in the human genome immediately after fertilization, but how and when PZMs affect development and lifetime health remain unclear. To study the origins and functional consequences of PZMs, we generated a multitissue atlas of PZMs spanning 54 tissue and cell types from 948 donors. Nearly half the variation in mutation burden among tissue samples can be explained by measured technical and biological effects, and 9% can be attributed to donor-specific effects. Through phylogenetic reconstruction of PZMs, we found that their type and predicted functional impact vary during prenatal development, across tissues, and through the germ cell life cycle. Thus, methods for interpreting effects across the body and the life span are needed to fully understand the consequences of genetic variants.


Subject(s)
DNA Mutational Analysis , Longevity , Zygote , Female , Humans , Longevity/genetics , Mutation , Phylogeny , RNA-Seq
20.
medRxiv ; 2023 Mar 28.
Article in English | MEDLINE | ID: mdl-36993312

ABSTRACT

Human genetic variation has enabled the identification of several key regulators of fetal-to-adult hemoglobin switching, including BCL11A, resulting in therapeutic advances. However, despite the progress made, limited further insights have been obtained to provide a fuller accounting of how genetic variation contributes to the global mechanisms of fetal hemoglobin (HbF) gene regulation. Here, we have conducted a multi-ancestry genome-wide association study of 28,279 individuals from several cohorts spanning 5 continents to define the architecture of human genetic variation impacting HbF. We have identified a total of 178 conditionally independent genome-wide significant or suggestive variants across 14 genomic windows. Importantly, these new data enable us to better define the mechanisms by which HbF switching occurs in vivo. We conduct targeted perturbations to define BACH2 as a new genetically-nominated regulator of hemoglobin switching. We define putative causal variants and underlying mechanisms at the well-studied BCL11A and HBS1L-MYB loci, illuminating the complex variant-driven regulation present at these loci. We additionally show how rare large-effect deletions in the HBB locus can interact with polygenic variation to influence HbF levels. Our study paves the way for the next generation of therapies to more effectively induce HbF in sickle cell disease and ß-thalassemia.

SELECTION OF CITATIONS
SEARCH DETAIL
...