RESUMO
An increasing number of individuals with intellectual developmental disorder (IDD) and heterozygous variants in BCL11A are identified, yet our knowledge of manifestations and mutational spectrum is lacking. To address this, we performed detailed analysis of 42 individuals with BCL11A-related IDD (BCL11A-IDD, a.k.a. Dias-Logan syndrome) ascertained through an international collaborative network, and reviewed 35 additional previously reported patients. Analysis of 77 affected individuals identified 60 unique disease-causing variants (30 frameshift, 7 missense, 6 splice-site, 17 stop-gain) and 8 unique BCL11A microdeletions. We define the most prevalent features of BCL11A-IDD: IDD, postnatal-onset microcephaly, hypotonia, behavioral abnormalities, autism spectrum disorder, and persistence of fetal hemoglobin (HbF), and identify autonomic dysregulation as new feature. BCL11A-IDD is distinguished from 2p16 microdeletion syndrome, which has a higher incidence of congenital anomalies. Our results underscore BCL11A as an important transcription factor in human hindbrain development, identifying a previously underrecognized phenotype of a small brainstem with a reduced pons/medulla ratio. Genotype-phenotype correlation revealed an isoform-dependent trend in severity of truncating variants: those affecting all isoforms are associated with higher frequency of hypotonia, and those affecting the long (BCL11A-L) and extra-long (-XL) isoforms, sparing the short (-S), are associated with higher frequency of postnatal microcephaly. With the largest international cohort to date, this study highlights persistence of fetal hemoglobin as a consistent biomarker and hindbrain abnormalities as a common feature. It contributes significantly to our understanding of BCL11A-IDD through an extensive unbiased multi-center assessment, providing valuable insights for diagnosis, management and counselling, and into BCL11A's role in brain development.
RESUMO
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present Phenopacket Store. Phenopacket Store v.0.1.19 includes 6,668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes and 3,834 unique pathogenic alleles curated from 959 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
RESUMO
Purpose: Rapid genetic testing in the critical care setting may guide diagnostic evaluation, direct therapies, and help families and care providers make informed decisions about goals of care. We tested whether a simplified DNA extraction and library preparation process would enable us to perform ultra-rapid assessment of genetic risk for a Mendelian condition, based on information from an affected sibling, using long-read genome sequencing and targeted analysis. Methods: Following extraction of DNA from cord blood and rapid library preparation, genome sequencing was performed on an Oxford Nanopore PromethION. FASTQ files were generated from original sequencing data in near real-time and aligned to a reference genome. Variant calling and analysis were performed at timed intervals. Results: We optimized the DNA extraction and library preparation methods to create sufficient library for sequencing from 500 µL of blood. Real-time, targeted analysis was performed to determine that the newborn was neither affected nor a heterozygote for variants underlying a Mendelian condition. Phasing of the target region and prior knowledge of the affected haplotypes supported our interpretation despite a low level of coverage at 3 hours of life. Conclusion: This proof-of-concept experiment demonstrates how prior knowledge of haplotype structure or familial variants can be used to rapidly evaluate an individual at risk for a genetic disease. While ultra-rapid sequencing remains both complex and cost prohibitive, our method is more easily automated than prior approaches and uses smaller volumes of blood, thus may be more easily adopted for future studies of ultra-rapid genome sequencing in the clinical setting.
RESUMO
Biallelic pathogenic variants in UQCRFS1 underlie a rare form of isolated mitochondrial complex III deficiency associated with lactic acidosis and a distinctive scalp alopecia previously described in two unrelated probands. Here, we describe a participant in the Undiagnosed Diseases Network (UDN) with a dual diagnosis of two autosomal recessive disorders revealed by genome sequencing: UQCRFS1-related mitochondrial complex III deficiency and GJA8-related cataracts. Both pathogenic variants have been reported before: UQCRFS1 (NM_006003.3:c.215-1 G>C, p.Val72_Thr81del10) in a case with mitochondrial complex III deficiency and GJA8 (NM 005267.5:c.736 G>T, p.Glu246*) as a somatic change in aged cornea leading to decreased junctional coupling. A multi-modal approach combining enzyme assays and cellular proteomics analysis provided clear evidence of complex III respiratory chain dysfunction and low abundance of the Rieske iron-sulfur protein, validating the pathogenic effect of the UQCRFS1 variant. This report extends the genotypic and phenotypic spectrum for these two rare disorders and highlights the utility of deep phenotyping and genomics data to achieve diagnosis and insights into rare disease.
RESUMO
Bicuspid aortic valve (BAV) is the most common congenital heart lesion with an estimated population prevalence of 1%. We hypothesize that specific gene variants predispose to early-onset complications of BAV (EBAV). We analyzed whole-exome sequences (WESs) to identify rare coding variants that contribute to BAV disease in 215 EBAV-affected families. Predicted damaging variants in candidate genes with moderate or strong supportive evidence to cause developmental cardiac phenotypes were present in 107 EBAV-affected families (50% of total), including genes that cause BAV (9%) or heritable thoracic aortic disease (HTAD, 19%). After appropriate filtration, we also identified 129 variants in 54 candidate genes that are associated with autosomal-dominant congenital heart phenotypes, including recurrent deleterious variation of FBN2, MYH6, channelopathy genes, and type 1 and 5 collagen genes. These findings confirm our hypothesis that unique rare genetic variants drive early-onset presentations of BAV disease.
Assuntos
Valva Aórtica , Doença da Válvula Aórtica Bicúspide , Sequenciamento do Exoma , Doenças das Valvas Cardíacas , Linhagem , Humanos , Doença da Válvula Aórtica Bicúspide/genética , Doença da Válvula Aórtica Bicúspide/patologia , Valva Aórtica/anormalidades , Valva Aórtica/patologia , Doenças das Valvas Cardíacas/genética , Masculino , Feminino , Predisposição Genética para Doença , Idade de Início , Fenótipo , Exoma/genética , Adulto , Cadeias Pesadas de Miosina/genética , Fibrilina-2/genética , Miosinas Cardíacas/genéticaRESUMO
To identify modifier loci underlying variation in body mass index (BMI) in persons with cystic fibrosis (pwCF), we performed a genome-wide association study (GWAS). Utilizing longitudinal height and weight data, along with demographic information and covariates from 4,393 pwCF, we calculated AvgBMIz representing the average of per-quarter BMI Z scores. The GWAS incorporated 9.8M single nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) > 0.005 extracted from whole-genome sequencing (WGS) of each study subject. We observed genome-wide significant association with a variant in FTO (FaT mass and Obesity-associated gene; rs28567725; p value = 1.21e-08; MAF = 0.41, ß = 0.106; n = 4,393 individuals) and a variant within ADAMTS5 (A Disintegrin And Metalloproteinase with ThromboSpondin motifs 5; rs162500; p value = 2.11e-10; MAF = 0.005, ß = -0.768; n = 4,085 pancreatic-insufficient individuals). Notably, BMI-associated variants in ADAMTS5 occur on a haplotype that is much more common in African (AFR, MAF = 0.183) than European (EUR, MAF = 0.006) populations (1000 Genomes project). A polygenic risk score (PRS) calculated using 924 SNPs (excluding 17 in FTO) showed significant association with AvgBMIz (p value = 2.2e-16; r2 = 0.03). Association between variants in FTO and the PRS correlation reveals similarities in the genetic architecture of BMI in CF and the general population. Inclusion of Black individuals in whom the single-gene disorder CF is much less common but genomic diversity is greater facilitated detection of association with variants that are in LD with functional SNPs in ADAMTS5. Our results illustrate the importance of population diversity, particularly when attempting to identify variants that manifest only under certain physiologic conditions.
Assuntos
Dioxigenase FTO Dependente de alfa-Cetoglutarato , Índice de Massa Corporal , Fibrose Cística , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Fibrose Cística/genética , Masculino , Feminino , Dioxigenase FTO Dependente de alfa-Cetoglutarato/genética , Adulto , Proteína ADAMTS5/genética , Criança , Adolescente , Frequência do Gene , Haplótipos , Predisposição Genética para Doença , Adulto Jovem , Obesidade/genética , Genes ModificadoresRESUMO
The precise regulation of DNA replication is vital for cellular division and genomic integrity. Central to this process is the replication factor C (RFC) complex, encompassing five subunits, which loads proliferating cell nuclear antigen onto DNA to facilitate the recruitment of replication and repair proteins and enhance DNA polymerase processivity. While RFC1's role in cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS) is known, the contributions of RFC2-5 subunits on human Mendelian disorders is largely unexplored. Our research links bi-allelic variants in RFC4, encoding a core RFC complex subunit, to an undiagnosed disorder characterized by incoordination and muscle weakness, hearing impairment, and decreased body weight. We discovered across nine affected individuals rare, conserved, predicted pathogenic variants in RFC4, all likely to disrupt the C-terminal domain indispensable for RFC complex formation. Analysis of a previously determined cryo-EM structure of RFC bound to proliferating cell nuclear antigen suggested that the variants disrupt interactions within RFC4 and/or destabilize the RFC complex. Cellular studies using RFC4-deficient HeLa cells and primary fibroblasts demonstrated decreased RFC4 protein, compromised stability of the other RFC complex subunits, and perturbed RFC complex formation. Additionally, functional studies of the RFC4 variants affirmed diminished RFC complex formation, and cell cycle studies suggested perturbation of DNA replication and cell cycle progression. Our integrated approach of combining in silico, structural, cellular, and functional analyses establishes compelling evidence that bi-allelic loss-of-function RFC4 variants contribute to the pathogenesis of this multisystemic disorder. These insights broaden our understanding of the RFC complex and its role in human health and disease.
Assuntos
Proteína de Replicação C , Humanos , Proteína de Replicação C/genética , Proteína de Replicação C/metabolismo , Masculino , Células HeLa , Feminino , Fenótipo , Replicação do DNA/genética , Adulto , Mutação , Antígeno Nuclear de Célula em Proliferação/metabolismo , Antígeno Nuclear de Célula em Proliferação/genética , AlelosRESUMO
BACKGROUND: Congenital Myasthenic Syndromes (CMS) are rare genetic diseases, which share as a common denominator muscle fatigability due to failure of neuromuscular transmission. A distinctive clinical feature of presynaptic CMS variants caused by defects of the synthesis of acetylcholine is the association with life-threatening episodes of apnea. One of these variants is caused by mutations in the SLC5A7 gene, which encodes the sodium-dependent HC-3 high-affinity choline transporter 1 (CHT1). To our knowledge there are no published cases of this CMS type in Latin America. CASE PRESENTATION: We present two cases of CHT1-CMS. Both patients were males presenting with repeated episodes of apnea, hypotonia, weakness, ptosis, mild ophthalmoparesis, and bulbar deficit. The first case also presented one isolated seizure, while the second case showed global developmental delay. Both cases, exhibited incomplete improvement with treatment with pyridostigmine. CONCLUSIONS: This report emphasizes the broad incidence of CMS with episodic apnea caused by mutations in the SLC5A7 gene and the frequent association of this condition with serious manifestations of central nervous system involvement.
Assuntos
Síndromes Miastênicas Congênitas , Humanos , Síndromes Miastênicas Congênitas/genética , Masculino , Mutação , Simportadores/genética , Criança , Pré-EscolarRESUMO
BACKGROUND: Primary congenital glaucoma (PCG) affects approximately 1 in 10,000 live born infants in the United States (U.S.). PCG has a autosomal recessive inheritance pattern, and variable expressivity and reduced penetrance have been reported. Likely causal variants in the most commonly mutated gene, CYP1B1, are less prevalent in the U.S., suggesting that alternative genes may contribute to the condition. This study utilized exome sequencing to investigate the genetic architecture of PCG in the U.S. and to identify novel genes and variants. METHODS: We studied 37 family trios where infants had PCG and were part of the National Birth Defects Prevention Study (births 1997-2011), a U.S. multicenter study of birth defects. Samples underwent exome sequencing and sequence reads were aligned to the human reference sample (NCBI build 37/hg19). Variant filtration was conducted under de novo and Mendelian inheritance models using GEMINI. RESULTS: Among candidate variants, CYP1B1 was most represented (five trios, 13.5%). Twelve probands (32%) had potentially pathogenic variants in other genes not previously linked to PCG but important in eye development and/or to underlie Mendelian conditions with potential phenotypic overlap (e.g., CRYBB2, RXRA, GLI2). CONCLUSION: Variation in the genes identified in this population-based study may help to further explain the genetics of PCG.
Assuntos
Citocromo P-450 CYP1B1 , Sequenciamento do Exoma , Exoma , Glaucoma , Humanos , Glaucoma/genética , Glaucoma/congênito , Citocromo P-450 CYP1B1/genética , Feminino , Masculino , Sequenciamento do Exoma/métodos , Estados Unidos , Exoma/genética , Mutação/genética , Predisposição Genética para Doença , Lactente , Recém-NascidoRESUMO
ANK3 encodes ankyrin-G, a protein involved in neuronal development and signaling. Alternative splicing gives rise to three ankyrin-G isoforms comprising different domains with distinct expression patterns. Mono- or biallelic ANK3 variants are associated with non-specific syndromic intellectual disability in 14 individuals (seven with monoallelic and seven with biallelic variants). In this study, we describe the clinical features of 13 additional individuals and review the data on a total of 27 individuals (16 individuals with monoallelic and 11 with biallelic ANK3 variants) and demonstrate that the phenotype for biallelic variants is more severe. The phenotypic features include language delay (92%), autism spectrum disorder (76%), intellectual disability (78%), hypotonia (65%), motor delay (68%), attention deficit disorder (ADD) or attention deficit hyperactivity disorder (ADHD) (57%), sleep disturbances (50%), aggressivity/self-injury (37.5%), and epilepsy (35%). A notable phenotypic difference was presence of ataxia in three individuals with biallelic variants, but in none of the individuals with monoallelic variants. While the majority of the monoallelic variants are predicted to result in a truncated protein, biallelic variants are almost exclusively missense. Moreover, mono- and biallelic variants appear to be localized differently across the three different ankyrin-G isoforms, suggesting isoform-specific pathological mechanisms.
Assuntos
Anquirinas , Deficiência Intelectual , Transtornos do Neurodesenvolvimento , Adolescente , Adulto , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Masculino , Alelos , Anquirinas/genética , Transtorno do Deficit de Atenção com Hiperatividade/genética , Transtorno do Espectro Autista/genética , Epilepsia/genética , Estudos de Associação Genética , Predisposição Genética para Doença , Genótipo , Deficiência Intelectual/genética , Deficiência Intelectual/patologia , Transtornos do Desenvolvimento da Linguagem/genética , Mutação/genética , Fenótipo , Transtornos do Neurodesenvolvimento/genéticaRESUMO
Since the first novel gene discovery for a Mendelian condition was made via exome sequencing, the rapid increase in the number of genes known to underlie Mendelian conditions coupled with the adoption of exome (and more recently, genome) sequencing by diagnostic testing labs has changed the landscape of genomic testing for rare diseases. Specifically, many individuals suspected to have a Mendelian condition are now routinely offered clinical ES. This commonly results in a precise genetic diagnosis but frequently overlooks the identification of novel candidate genes. Such candidates are also less likely to be identified in the absence of large-scale gene discovery research programs. Accordingly, clinical laboratories have both the opportunity, and some might argue a responsibility, to contribute to novel gene discovery, which should, in turn, increase the diagnostic yield for many conditions. However, clinical diagnostic laboratories must necessarily balance priorities for throughput, turnaround time, cost efficiency, clinician preferences, and regulatory constraints and often do not have the infrastructure or resources to effectively participate in either clinical translational or basic genome science research efforts. For these and other reasons, many laboratories have historically refrained from broadly sharing potentially pathogenic variants in novel genes via networks such as Matchmaker Exchange, much less reporting such results to ordering providers. Efforts to report such results are further complicated by a lack of guidelines for clinical reporting and interpretation of variants in novel candidate genes. Nevertheless, there are myriad benefits for many stakeholders, including patients/families, clinicians, and researchers, if clinical laboratories systematically and routinely identify, share, and report novel candidate genes. To facilitate this change in practice, we developed criteria for triaging, sharing, and reporting novel candidate genes that are most likely to be promptly validated as underlying a Mendelian condition and translated to use in clinical settings.
Assuntos
Testes Genéticos , Genômica , Humanos , Exoma/genética , Sequenciamento do Exoma/métodos , Predisposição Genética para Doença , Testes Genéticos/métodos , Testes Genéticos/normas , Variação Genética , Genoma Humano/genética , Genômica/métodosRESUMO
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
RESUMO
CSMD1 (Cub and Sushi Multiple Domains 1) is a well-recognized regulator of the complement cascade, an important component of the innate immune response. CSMD1 is highly expressed in the central nervous system (CNS) where emergent functions of the complement pathway modulate neural development and synaptic activity. While a genetic risk factor for neuropsychiatric disorders, the role of CSMD1 in neurodevelopmental disorders is unclear. Through international variant sharing, we identified inherited biallelic CSMD1 variants in eight individuals from six families of diverse ancestry who present with global developmental delay, intellectual disability, microcephaly, and polymicrogyria. We modeled CSMD1 loss-of-function (LOF) pathogenesis in early-stage forebrain organoids differentiated from CSMD1 knockout human embryonic stem cells (hESCs). We show that CSMD1 is necessary for neuroepithelial cytoarchitecture and synchronous differentiation. In summary, we identified a critical role for CSMD1 in brain development and biallelic CSMD1 variants as the molecular basis of a previously undefined neurodevelopmental disorder.
Assuntos
Deficiência Intelectual , Proteínas de Membrana , Humanos , Deficiência Intelectual/genética , Deficiência Intelectual/patologia , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Feminino , Masculino , Transtornos do Neurodesenvolvimento/genética , Alelos , Malformações do Desenvolvimento Cortical/genética , Malformações do Desenvolvimento Cortical/patologia , Criança , Pré-Escolar , Diferenciação Celular/genética , Proteínas Supressoras de TumorRESUMO
Thyrotropin (TSH) is the master regulator of thyroid gland growth and function. Resistance to TSH (RTSH) describes conditions with reduced sensitivity to TSH. Dominantly inherited RTSH has been linked to a locus on chromosome 15q, but its genetic basis has remained elusive. Here we show that non-coding mutations in a (TTTG)4 short tandem repeat (STR) underlie dominantly inherited RTSH in all 82 affected participants from 12 unrelated families. The STR is contained in a primate-specific Alu retrotransposon with thyroid-specific cis-regulatory chromatin features. Fiber-seq and RNA-seq studies revealed that the mutant STR activates a thyroid-specific enhancer cluster, leading to haplotype-specific upregulation of the bicistronic MIR7-2/MIR1179 locus 35 kb downstream and overexpression of its microRNA products in the participants' thyrocytes. An imbalance in signaling pathways targeted by these micro-RNAs provides a working model for this cause of RTSH. This finding broadens our current knowledge of genetic defects altering pituitary-thyroid feedback regulation.
Assuntos
Cromossomos Humanos Par 15 , Elementos Facilitadores Genéticos , MicroRNAs , Repetições de Microssatélites , Mutação , Tireotropina , Animais , Feminino , Humanos , Masculino , Cromossomos Humanos Par 15/genética , MicroRNAs/genética , Repetições de Microssatélites/genética , Linhagem , Primatas/genética , Glândula Tireoide/metabolismo , Tireotropina/genéticaRESUMO
BACKGROUND: Cystic fibrosis (CF) is caused by deleterious variants in each CFTR gene. We investigated the utility of whole-gene CFTR sequencing when fewer than two pathogenic or likely pathogenic (P/LP) variants were detected by conventional testing (sequencing of exons and flanking introns) of CFTR. METHODS: Individuals with features of CF and a CF-diagnostic sweat chloride concentration with zero or one P/LP variants identified by conventional testing enrolled in the CF Mutation Analysis Program (MAP) underwent whole-gene CFTR sequencing. Replication was performed on individuals enrolled in the CF Genome Project (CFGP), followed by phenotype review and interrogation of other genes. RESULTS: Whole-gene sequencing identified a second P/LP variant in 20/43 MAP enrollees (47 %) and 10/22 CFGP enrollees (45 %) who had one P/LP variant after conventional testing. No P/LP variants were detected when conventional testing was negative (MAP: n = 43; CFGP: n = 13). Genome-wide analysis was unable to find an alternative etiology in CFGP participants with fewer than two P/LP CFTR variants and CF could not be confirmed in 91 % following phenotype re-review. CONCLUSIONS: Whole-gene CFTR analysis is beneficial in individuals with one previously-identified P/LP variant and a CF-diagnostic sweat chloride. Negative conventional CFTR testing indicates that the phenotype should be re-evaluated.
Assuntos
Regulador de Condutância Transmembrana em Fibrose Cística , Fibrose Cística , Humanos , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Fibrose Cística/genética , Feminino , Masculino , Fenótipo , Mutação , Criança , Análise Mutacional de DNA , Adolescente , Adulto , Testes Genéticos/métodos , Pré-EscolarRESUMO
Autosomal dominant polycystic kidney disease (ADPKD) is a well-described condition in which ~80% of cases have a genetic explanation, while the genetic basis of sporadic cystic kidney disease in adults remains unclear in ~30% of cases. This study aimed to identify novel genes associated with polycystic kidney disease (PKD) in patients with sporadic cystic kidney disease in which a clear genetic change was not identified in established genes. A next-generation sequencing panel analyzed known genes related to renal cysts in 118 sporadic cases, followed by whole-genome sequencing on 47 unrelated individuals without identified candidate variants. Three male patients were found to have rare missense variants in the X-linked gene Cilia And Flagella Associated Protein 47 (CFAP47). CFAP47 was expressed in primary cilia of human renal tubules, and knockout mice exhibited vacuolation of tubular cells and tubular dilation, providing evidence that CFAP47 is a causative gene involved in cyst formation. This discovery of CFAP47 as a newly identified gene associated with PKD, displaying X-linked inheritance, emphasizes the need for further cases to understand the role of CFAP47 in PKD.
RESUMO
Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.