ABSTRACT
In animals, the nervous system evolved as the primary interface between multicellular organisms and the environment. As organisms became larger and more complex, the primary functions of the nervous system expanded to include the modulation and coordination of individual responsive cells via paracrine and synaptic functions as well as to monitor and maintain the organism's own internal environment. This was initially accomplished via paracrine signaling and eventually through the assembly of multicell circuits in some lineages. Cells with similar functions and centralized nervous systems have independently arisen in several lineages. We highlight the molecular mechanisms that underlie parallel diversifications of the nervous system.
Subject(s)
Nervous System , Animals , Nervous System/metabolism , Biological Evolution , Humans , Signal Transduction/geneticsABSTRACT
Divergence of gene function is a hallmark of evolution, but assessing functional divergence over deep time is not trivial. The few alleles available for cross-species studies often fail to expose the entire functional spectrum of genes, potentially obscuring deeply conserved pleiotropic roles. Here, we explore the functional divergence of WUSCHEL HOMEOBOX9 (WOX9), suggested to have species-specific roles in embryo and inflorescence development. Using a cis-regulatory editing drive system, we generate a comprehensive allelic series in tomato, which revealed hidden pleiotropic roles for WOX9. Analysis of accessible chromatin and conserved cis-regulatory sequences identifies the regions responsible for this pleiotropic activity, the functions of which are conserved in groundcherry, a tomato relative. Mimicking these alleles in Arabidopsis, distantly related to tomato and groundcherry, reveals new inflorescence phenotypes, exposing a deeply conserved pleiotropy. We suggest that targeted cis-regulatory mutations can uncover conserved gene functions and reduce undesirable effects in crop improvement.
Subject(s)
Genes, Plant , Genetic Pleiotropy/genetics , Homeodomain Proteins/genetics , Plant Proteins/genetics , Regulatory Sequences, Nucleic Acid/genetics , Alleles , Arabidopsis/genetics , CRISPR-Cas Systems/genetics , Chromatin/metabolism , Gene Expression Regulation, Plant , Inflorescence/genetics , Solanum lycopersicum/genetics , Mutagenesis , Plant Development/genetics , Plants, Genetically Modified/genetics , Plants, Genetically Modified/growth & development , Plants, Genetically Modified/metabolism , Promoter Regions, Genetic , Solanaceae/genetics , Solanaceae/growth & developmentABSTRACT
Genetic influences on psychiatric disorders transcend diagnostic boundaries, suggesting substantial pleiotropy of contributing loci. However, the nature and mechanisms of these pleiotropic effects remain unclear. We performed analyses of 232,964 cases and 494,162 controls from genome-wide studies of anorexia nervosa, attention-deficit/hyperactivity disorder, autism spectrum disorder, bipolar disorder, major depression, obsessive-compulsive disorder, schizophrenia, and Tourette syndrome. Genetic correlation analyses revealed a meaningful structure within the eight disorders, identifying three groups of inter-related disorders. Meta-analysis across these eight disorders detected 109 loci associated with at least two psychiatric disorders, including 23 loci with pleiotropic effects on four or more disorders and 11 loci with antagonistic effects on multiple disorders. The pleiotropic loci are located within genes that show heightened expression in the brain throughout the lifespan, beginning prenatally in the second trimester, and play prominent roles in neurodevelopmental processes. These findings have important implications for psychiatric nosology, drug development, and risk prediction.
Subject(s)
Genetic Pleiotropy , Genetic Predisposition to Disease , Mental Disorders/genetics , Quantitative Trait Loci , Genome-Wide Association Study , Humans , NeurogenesisABSTRACT
Most secreted growth factors and cytokines are functionally pleiotropic because their receptors are expressed on diverse cell types. While important for normal mammalian physiology, pleiotropy limits the efficacy of cytokines and growth factors as therapeutics. Stem cell factor (SCF) is a growth factor that acts through the c-Kit receptor tyrosine kinase to elicit hematopoietic progenitor expansion but can be toxic when administered in vivo because it concurrently activates mast cells. We engineered a mechanism-based SCF partial agonist that impaired c-Kit dimerization, truncating downstream signaling amplitude. This SCF variant elicited biased activation of hematopoietic progenitors over mast cells in vitro and in vivo. Mouse models of SCF-mediated anaphylaxis, radioprotection, and hematopoietic expansion revealed that this SCF partial agonist retained therapeutic efficacy while exhibiting virtually no anaphylactic off-target effects. The approach of biasing cell activation by tuning signaling thresholds and outputs has applications to many dimeric receptor-ligand systems.
Subject(s)
Anaphylaxis/metabolism , Hematopoietic Stem Cells/immunology , Mast Cells/metabolism , Proto-Oncogene Proteins c-kit/metabolism , Signal Transduction , Stem Cell Factor/metabolism , Anaphylaxis/immunology , Animals , Dimerization , Humans , Mast Cells/immunology , Mice , Mice, Inbred C57BL , Models, Molecular , Protein Engineering , Proto-Oncogene Proteins c-kit/agonists , Proto-Oncogene Proteins c-kit/chemistry , Stem Cell Factor/chemistry , Stem Cell Factor/geneticsABSTRACT
Genome-wide association studies (GWASs) have ushered in a new era of reproducible discovery in psychiatric genetics. The field has now identified hundreds of common genetic variants that are associated with mental disorders, and many of them influence more than one disorder. By advancing the understanding of causal biology underlying psychopathology, GWAS results are poised to inform the development of novel therapeutics, stratification of at-risk patients, and perhaps even the revision of top-down classification systems in psychiatry. Here, we provide a concise review of GWAS findings with an emphasis on findings that have elucidated the shared genetic etiology of psychopathology, summarizing insights at three levels of analysis: 1) genome-wide architecture; 2) networks, pathways, and gene sets; and 3) individual variants/genes. Three themes emerge from these efforts. First, all psychiatric phenotypes are heritable, highly polygenic, and influenced by many pleiotropic variants with incomplete penetrance. Second, GWAS results highlight the broad etiological roles of neuronal biology, system-wide effects over localized effects, and early neurodevelopment as a critical period. Third, many loci that are robustly associated with multiple forms of psychopathology harbor genes that are involved in synaptic structure and function. Finally, we conclude our review by discussing the implications that GWAS results hold for the field of psychiatry, as well as expected challenges and future directions in the next stage of psychiatric genetics.
Subject(s)
Genome-Wide Association Study , Mental Disorders , Humans , Genome-Wide Association Study/methods , Genetic Predisposition to Disease , Mental Disorders/genetics , PhenotypeABSTRACT
Anthropogeny is a classic term encompassing transdisciplinary investigations of the origins of the human species. Comparative anthropogeny is a systematic comparison of humans and other living nonhuman hominids (so-called "great apes"), aiming to identify distinctly human features in health and disease, with the overall goal of explaining human origins. We begin with a historical perspective, briefly describing how the field progressed from the earliest evolutionary insights to the current emphasis on in-depth molecular and genomic investigations of "human-specific" biology and an increased appreciation for cultural impacts on human biology. While many such genetic differences between humans and other hominids have been revealed over the last two decades, this information remains insufficient to explain the most distinctive phenotypic traits distinguishing humans from other living hominids. Here we undertake a complementary approach of "comparative physiological anthropogeny," along the lines of the preclinical medical curriculum, i.e., beginning with anatomy and considering each physiological system and in each case considering genetic and molecular components that are relevant. What is ultimately needed is a systematic comparative approach at all levels from molecular to physiological to sociocultural, building networks of related information, drawing inferences, and generating testable hypotheses. The concluding section will touch on distinctive considerations in the study of human evolution, including the importance of gene-culture interactions.
Subject(s)
Biological Evolution , Hominidae , Animals , Humans , Hominidae/genetics , Genome , PhenotypeABSTRACT
As crucial mediators and regulators of our immune system, cytokines are involved in a broad range of biological processes and are implicated in various disease pathologies. The field of cytokine therapeutics has gained much momentum from the maturation of conventional protein engineering methodologies such as structure-based designs and/or directed evolution, which is further aided by the advent of in silico protein designs and characterization. Just within the past 5 years, there has been an explosion of proof-of-concept, preclinical, and clinical studies that utilize an armory of protein engineering methods to develop cytokine-based drugs. Here, we highlight the key engineering strategies undertaken by recent studies that aim to improve the pharmacodynamic and pharmacokinetic profile of interferons and other cytokines as therapeutics.
Subject(s)
Cytokines , Interferons , Interferons/therapeutic use , Immunotherapy/methodsABSTRACT
Adaptive evolution often involves structural variation affecting genes or cis-regulatory changes that engender novel and favorable gain-of-function gene regulation. Such mutation could result in a favorable dominant trait. At the same time, the gene product could be dosage sensitive if its change in concentration disrupts another trait. As a result, the mutant allele would display dosage-sensitive pleiotropy (DSP). By minimizing imbalance while conserving the favorable dominant effect, heterozygosity can increase fitness and result in heterosis. The properties of these alleles are consistent with evidence from multiple studies that indicate increased fitness of heterozygous regulatory mutations. DSP can help explain mysterious properties of heterosis as well as other effects of hybridization.
Subject(s)
Alleles , Humans , Hybrid Vigor/genetics , Mutation , Animals , Genetic Pleiotropy , Heterozygote , Gene Expression Regulation/genetics , Evolution, MolecularABSTRACT
Mendelian randomization uses genetic variants as instrumental variables to make causal inferences on the effect of an exposure on an outcome. Due to the recent abundance of high-powered genome-wide association studies, many putative causal exposures of interest have large numbers of independent genetic variants with which they associate, each representing a potential instrument for use in a Mendelian randomization analysis. Such polygenic analyses increase the power of the study design to detect causal effects; however, they also increase the potential for bias due to instrument invalidity. Recent attention has been given to dealing with bias caused by correlated pleiotropy, which results from violation of the "instrument strength independent of direct effect" assumption. Although methods have been proposed that can account for this bias, a number of restrictive conditions remain in many commonly used techniques. In this paper, we propose a Bayesian framework for Mendelian randomization that provides valid causal inference under very general settings. We propose the methods MR-Horse and MVMR-Horse, which can be performed without access to individual-level data, using only summary statistics of the type commonly published by genome-wide association studies, and can account for both correlated and uncorrelated pleiotropy. In simulation studies, we show that the approach retains type I error rates below nominal levels even in high-pleiotropy scenarios. We demonstrate the proposed approaches in applied examples in both univariable and multivariable settings, some with very weak instruments.
Subject(s)
Genome-Wide Association Study , Mendelian Randomization Analysis , Animals , Horses , Bayes Theorem , Computer Simulation , Multifactorial InheritanceABSTRACT
We present shaPRS, a method that leverages widespread pleiotropy between traits or shared genetic effects across ancestries, to improve the accuracy of polygenic scores. The method uses genome-wide summary statistics from two diseases or ancestries to improve the genetic effect estimate and standard error at SNPs where there is homogeneity of effect between the two datasets. When there is significant evidence of heterogeneity, the genetic effect from the disease or population closest to the target population is maintained. We show via simulation and a series of real-world examples that shaPRS substantially enhances the accuracy of polygenic risk scores (PRSs) for complex diseases and greatly improves PRS performance across ancestries. shaPRS is a PRS pre-processing method that is agnostic to the actual PRS generation method, and as a result, it can be integrated into existing PRS generation pipelines and continue to be applied as more performant PRS methods are developed over time.
Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Multifactorial Inheritance , Polymorphism, Single Nucleotide , Multifactorial Inheritance/genetics , Humans , Models, Genetic , Computer Simulation , Genetic Pleiotropy , PhenotypeABSTRACT
Whereas 16p11.2 BP4-5 copy-number variants (CNVs) represent one of the most pleiotropic etiologies of genomic syndromes in both clinical and population cohorts, the mechanisms leading to such pleiotropy remain understudied. Identifying 73 deletion and 89 duplication carrier individuals among unrelated White British UK Biobank participants, we performed a phenome-wide association study (PheWAS) between the region's copy number and 117 complex traits and diseases, mimicking four dosage models. Forty-six phenotypes (39%) were affected by 16p11.2 BP4-5 CNVs, with the deletion-only, mirror, U-shape, and duplication-only models being the best fit for 30, 10, 4, and 2 phenotypes, respectively, aligning with the stronger deleteriousness of the deletion. Upon individually adjusting CNV effects for either body mass index (BMI), height, or educational attainment (EA), we found that sixteen testable deletion-driven associations-primarily with cardiovascular and metabolic traits-were BMI dependent, with EA playing a more subtle role and no association depending on height. Bidirectional Mendelian randomization supported that 13 out of these 16 associations were secondary consequences of the CNV's impact on BMI. For the 23 traits that remained significantly associated upon individual adjustment for mediators, matched-control analyses found that 10 phenotypes, including musculoskeletal traits, liver enzymes, fluid intelligence, platelet count, and pneumonia and acute kidney injury risk, remained associated under strict Bonferroni correction, with 10 additional nominally significant associations. These results paint a complex picture of 16p11.2 BP4-5's pleiotropic pattern that involves direct effects on multiple physiological systems and indirect co-morbidities consequential to the CNV's impact on BMI and EA, acting through trait-specific dosage mechanisms.
ABSTRACT
Recurrent genomic rearrangements at 16p11.2 BP4-5 represent one of the most common causes of genomic disorders. Originally associated with increased risk for autism spectrum disorder, schizophrenia, and intellectual disability, as well as adiposity and head circumference, these CNVs have since been associated with a plethora of phenotypic alterations, albeit with high variability in expressivity and incomplete penetrance. Here, we comprehensively review the pleiotropy associated with 16p11.2 BP4-5 rearrangements to shine light on its full phenotypic spectrum. Illustrating this phenotypic heterogeneity, we expose many parallels between findings gathered from clinical versus population-based cohorts, which often point to the same physiological systems, and emphasize the role of the CNV beyond neuropsychiatric and anthropometric traits. Revealing the complex and variable clinical manifestations of this CNV is crucial for accurate diagnosis and personalized treatment strategies for carrier individuals. Furthermore, we discuss areas of research that will be key to identifying factors contributing to phenotypic heterogeneity and gaining mechanistic insights into the molecular pathways underlying observed associations, while demonstrating how diversity in affected individuals, cohorts, experimental models, and analytical approaches can catalyze discoveries.
ABSTRACT
Mendelian randomization (MR) provides valuable assessments of the causal effect of exposure on outcome, yet the application of conventional MR methods for mapping risk genes encounters new challenges. One of the issues is the limited availability of expression quantitative trait loci (eQTLs) as instrumental variables (IVs), hampering the estimation of sparse causal effects. Additionally, the often context- or tissue-specific eQTL effects challenge the MR assumption of consistent IV effects across eQTL and GWAS data. To address these challenges, we propose a multi-context multivariable integrative MR framework, mintMR, for mapping expression and molecular traits as joint exposures. It models the effects of molecular exposures across multiple tissues in each gene region, while simultaneously estimating across multiple gene regions. It uses eQTLs with consistent effects across more than one tissue type as IVs, improving IV consistency. A major innovation of mintMR involves employing multi-view learning methods to collectively model latent indicators of disease relevance across multiple tissues, molecular traits, and gene regions. The multi-view learning captures the major patterns of disease relevance and uses these patterns to update the estimated tissue relevance probabilities. The proposed mintMR iterates between performing a multi-tissue MR for each gene region and joint learning the disease-relevant tissue probabilities across gene regions, improving the estimation of sparse effects across genes. We apply mintMR to evaluate the causal effects of gene expression and DNA methylation for 35 complex traits using multi-tissue QTLs as IVs. The proposed mintMR controls genome-wide inflation and offers insights into disease mechanisms.
Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Mendelian Randomization Analysis , Quantitative Trait Loci , Humans , Mendelian Randomization Analysis/methods , Genome-Wide Association Study/methods , Organ Specificity/genetics , Models, Genetic , Polymorphism, Single NucleotideABSTRACT
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction, and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks, or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture, and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits, and microbiome abundances. It allows for covariates, ascertainment, and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, most of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Subject(s)
Genetic Pleiotropy , Humans , Genome-Wide Association Study/methods , Phenotype , Gene Expression/genetics , Computer Simulation , Models, Genetic , Quantitative Trait Loci , Polymorphism, Single NucleotideABSTRACT
The Drosophila melanogaster foraging (for) gene is a well-established example of a gene with major effects on behavior and natural variation. This gene is best known for underlying the behavioral strategies of rover and sitter foraging larvae, having been mapped and named for this phenotype. Nevertheless, in the last three decades an extensive array of studies describing for's role as a modifier of behavior in a wide range of phenotypes, in both Drosophila and other organisms, has emerged. Furthermore, recent work reveals new insights into the genetic and molecular underpinnings of how for affects these phenotypes. In this article, we discuss the history of the for gene and its role in natural variation in behavior, plasticity, and behavioral pleiotropy, with special attention to recent findings on the molecular structure and transcriptional regulation of this gene.
Subject(s)
Cyclic GMP-Dependent Protein Kinases/genetics , Drosophila Proteins/genetics , Drosophila melanogaster/physiology , Feeding Behavior/physiology , Gene-Environment Interaction , Genetic Pleiotropy , Animals , Ants/physiology , Drosophila melanogaster/genetics , Larva/physiology , Memory/physiology , Sleep/genetics , Sleep/physiology , Social Behavior , Thermotolerance/physiologyABSTRACT
Angiotensin-convertingenzyme 2 (ACE2) has dual functions, regulating cardiovascular physiology and serving as the receptor for coronaviruses. Bats, the only true flying mammals and natural viral reservoirs, have evolved positive alterations in traits related to both functions of ACE2. This suggests significant evolutionary changes in ACE2 during bat evolution. To test this hypothesis, we examine the selection pressure in ACE2 along the ancestral branch of all bats (AncBat-ACE2), where powered flight and bat-coronavirus coevolution occurred, and detect a positive selection signature. To assess the functional effects of positive selection, we resurrect AncBat-ACE2 and its mutant (AncBat-ACE2-mut) created by replacing the positively selected sites. Compared to AncBat-ACE2-mut, AncBat-ACE2 exhibits stronger enzymatic activity, enhances mice's performance in exercise fatigue, and shows lower affinity to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Our findings indicate the functional pleiotropy of positive selection in the ancient ACE2 of bats, providing an alternative hypothesis for the evolutionary origin of bats' defense against coronaviruses.
Subject(s)
Angiotensin-Converting Enzyme 2 , Chiroptera , Selection, Genetic , Chiroptera/virology , Chiroptera/genetics , Animals , Angiotensin-Converting Enzyme 2/genetics , Angiotensin-Converting Enzyme 2/metabolism , Mice , Genetic Pleiotropy , Evolution, Molecular , SARS-CoV-2/genetics , COVID-19/virology , COVID-19/genetics , Coronavirus/genetics , Humans , PhylogenyABSTRACT
Identifying the genetic basis of local adaptation and fitness trade-offs across environments is a central goal of evolutionary biology. Cold acclimation is an adaptive plastic response for surviving seasonal freezing, and costs of acclimation may be a general mechanism for fitness trade-offs across environments in temperate zone species. Starting with locally adapted ecotypes of Arabidopsis thaliana from Italy and Sweden, we examined the fitness consequences of a naturally occurring functional polymorphism in CBF2. This gene encodes a transcription factor that is a major regulator of cold-acclimated freezing tolerance and resides within a locus responsible for a genetic trade-off for long-term mean fitness. We estimated the consequences of alternate genotypes of CBF2 on 5-y mean fitness and fitness components at the native field sites by comparing near-isogenic lines with alternate genotypes of CBF2 to their genetic background ecotypes. The effects of CBF2 were validated at the nucleotide level using gene-edited lines in the native genetic backgrounds grown in simulated parental environments. The foreign CBF2 genotype in the local genetic background reduced long-term mean fitness in Sweden by more than 10%, primarily via effects on survival. In Italy, fitness was reduced by more than 20%, primarily via effects on fecundity. At both sites, the effects were temporally variable and much stronger in some years. The gene-edited lines confirmed that CBF2 encodes the causal variant underlying this genetic trade-off. Additionally, we demonstrated a substantial fitness cost of cold acclimation, which has broad implications for potential maladaptive responses to climate change.
Subject(s)
Arabidopsis Proteins , Arabidopsis , Arabidopsis/genetics , Mutation , Acclimatization/genetics , Arabidopsis Proteins/genetics , Transcription Factors/genetics , Cold Temperature , Genetic FitnessABSTRACT
Cytokines have long been considered promising cancer immunotherapy agents due to their endogenous role in activating and proliferating lymphocytes. However, since the initial FDA approvals of Interleukin-2 (IL-2) and Interferon-É (IFNÉ) for oncology over 30 years ago, cytokines have achieved little success in the clinic due to narrow therapeutic windows and dose-limiting toxicities. This is attributable to the discrepancy between the localized, regulated manner in which cytokines are deployed endogenously versus the systemic, untargeted administration used to date in most exogenous cytokine therapies. Furthermore, cytokines' ability to stimulate multiple cell types, often with paradoxical effects, may present significant challenges for their translation into effective therapies. Recently, protein engineering has emerged as a tool to address the shortcomings of first-generation cytokine therapies. In this perspective, we contextualize cytokine engineering strategies such as partial agonism, conditional activation and intratumoral retention through the lens of spatiotemporal regulation. By controlling the time, place, specificity, and duration of cytokine signaling, protein engineering can allow exogenous cytokine therapies to more closely approach their endogenous exposure profile, ultimately moving us closer to unlocking their full therapeutic potential.
Subject(s)
Cytokines , Neoplasms , Humans , Cytokines/metabolism , Neoplasms/drug therapy , Protein Engineering , ImmunotherapyABSTRACT
Hox genes are important regulators in animal development. They often show a mosaic of conserved (e.g., longitudinal axis patterning) and lineage-specific novel functions (e.g., development of skeletal, sensory, or locomotory systems). Despite extensive research over the past decades, it remains controversial at which node in the animal tree of life the Hox cluster evolved. Its presence already in the last common metazoan ancestor has been proposed, although the genomes of both putative earliest extant metazoan offshoots, the ctenophores and the poriferans, are devoid of Hox sequences. The lack of Hox genes in the supposedly "simple"-built poriferans and their low number in cnidarians and the basally branching bilaterians, the xenacoelomorphs, seems to support the classical notion that the number of Hox genes is correlated with the degree of animal complexity. However, the 4-fold increase of the Hox cluster in xiphosurans, a basally branching chelicerate clade, as well as the situation in some teleost fishes that show a multitude of Hox genes compared to, e.g., human, demonstrates, that there is no per se direct correlation between organismal complexity and Hox number. Traditional approaches have tried to base homology on the morphological level on shared expression profiles of individual genes, but recent data have shown that, in particular with respect to Hox and other regulatory genes, complex gene-gene interactions rather than expression signatures of individual genes alone are responsible for shaping morphological traits during ontogeny. Accordingly, for sound homology assessments and reconstructions of character evolution on organ system level, additional independent datasets (e.g., morphological, developmental) need to be included in any such analyses. If supported by solid data, proposed structural homology should be regarded as valid and not be rejected solely on the grounds of non-parsimonious distribution of the character over a given phylogenetic topology.
Subject(s)
Cnidaria , Homeodomain Proteins , Animals , Humans , Phylogeny , Homeodomain Proteins/genetics , Evolution, Molecular , Cnidaria/genetics , Genes, Homeobox/genetics , Multigene Family/geneticsABSTRACT
Genome-wide association studies (GWASs) across thousands of traits have revealed the pervasive pleiotropy of trait-associated genetic variants. While methods have been proposed to characterize pleiotropic components across groups of phenotypes, scaling these approaches to ultra-large-scale biobanks has been challenging. Here, we propose FactorGo, a scalable variational factor analysis model to identify and characterize pleiotropic components using biobank GWAS summary data. In extensive simulations, we observe that FactorGo outperforms the state-of-the-art (model-free) approach tSVD in capturing latent pleiotropic factors across phenotypes while maintaining a similar computational cost. We apply FactorGo to estimate 100 latent pleiotropic factors from GWAS summary data of 2,483 phenotypes measured in European-ancestry Pan-UK BioBank individuals (N = 420,531). Next, we find that factors from FactorGo are more enriched with relevant tissue-specific annotations than those identified by tSVD (p = 2.58E-10) and validate our approach by recapitulating brain-specific enrichment for BMI and the height-related connection between reproductive system and muscular-skeletal growth. Finally, our analyses suggest shared etiologies between rheumatoid arthritis and periodontal condition in addition to alkaline phosphatase as a candidate prognostic biomarker for prostate cancer. Overall, FactorGo improves our biological understanding of shared etiologies across thousands of GWASs.