RESUMEN
AIMS/HYPOTHESIS: Several studies have reported associations between specific proteins and type 2 diabetes risk in European populations. To better understand the role played by proteins in type 2 diabetes aetiology across diverse populations, we conducted a large proteome-wide association study using genetic instruments across four racial and ethnic groups: African; Asian; Hispanic/Latino; and European. METHODS: Genome and plasma proteome data from the Multi-Ethnic Study of Atherosclerosis (MESA) study involving 182 African, 69 Asian, 284 Hispanic/Latino and 409 European individuals residing in the USA were used to establish protein prediction models by using potentially associated cis- and trans-SNPs. The models were applied to genome-wide association study summary statistics of 250,127 type 2 diabetes cases and 1,222,941 controls from different racial and ethnic populations. RESULTS: We identified three, 44 and one protein associated with type 2 diabetes risk in Asian, European and Hispanic/Latino populations, respectively. Meta-analysis identified 40 proteins associated with type 2 diabetes risk across the populations, including well-established as well as novel proteins not yet implicated in type 2 diabetes development. CONCLUSIONS/INTERPRETATION: Our study improves our understanding of the aetiology of type 2 diabetes in diverse populations. DATA AVAILABILITY: The summary statistics of multi-ethnic type 2 diabetes GWAS of MVP, DIAMANTE, Biobank Japan and other studies are available from The database of Genotypes and Phenotypes (dbGaP) under accession number phs001672.v3.p1. MESA genetic, proteome and covariate data can be accessed through dbGaP under phs000209.v13.p3. All code is available on GitHub ( https://github.com/Arthur1021/MESA-1K-PWAS ).
RESUMEN
Drug-induced QT prolongation (diLQTS), and subsequent risk of torsade de pointes, is a major concern with use of many medications, including for non-cardiac conditions. The possibility that genetic risk, in the form of polygenic risk scores (PGS), could be integrated into prediction of risk of diLQTS has great potential, although it is unknown how genetic risk is related to clinical risk factors as might be applied in clinical decision-making. In this study, we examined the PGS for QT interval in 2500 subjects exposed to a known QT-prolonging drug on prolongation of the QT interval over 500ms on subsequent ECG using electronic health record data. We found that the normalized QT PGS was higher in cases than controls (0.212±0.954 vs. -0.0270±1.003, P = 0.0002), with an unadjusted odds ratio of 1.34 (95%CI 1.17-1.53, P<0.001) for association with diLQTS. When included with age and clinical predictors of QT prolongation, we found that the PGS for QT interval provided independent risk prediction for diLQTS, in which the interaction for high-risk diagnosis or with certain high-risk medications (amiodarone, sotalol, and dofetilide) was not significant, indicating that genetic risk did not modify the effect of other risk factors on risk of diLQTS. We found that a high-risk cutoff (QT PGS ≥ 2 standard deviations above mean), but not a low-risk cutoff, was associated with risk of diLQTS after adjustment for clinical factors, and provided one method of integration based on the decision-tree framework. In conclusion, we found that PGS for QT interval is an independent predictor of diLQTS, but that in contrast to existing theories about repolarization reserve as a mechanism of increasing risk, the effect is independent of other clinical risk factors. More work is needed for external validation in clinical decision-making, as well as defining the mechanism through which genes that increase QT interval are associated with risk of diLQTS.
Asunto(s)
Electrocardiografía , Síndrome de QT Prolongado , Herencia Multifactorial , Humanos , Masculino , Femenino , Síndrome de QT Prolongado/genética , Síndrome de QT Prolongado/inducido químicamente , Persona de Mediana Edad , Herencia Multifactorial/genética , Factores de Riesgo , Anciano , Adulto , Torsades de Pointes/inducido químicamente , Torsades de Pointes/genética , Estudios de Casos y Controles , Fenetilaminas/efectos adversos , Puntuación de Riesgo Genético , SulfonamidasRESUMEN
Genetic summary data are broadly accessible and highly useful including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into groups masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted substructure limits summary data usability, especially for understudied or admixed populations. Here, we present Summix2, a comprehensive set of methods and software based on a computationally efficient mixture model to estimate and adjust for substructure in genetic summary data. In extensive simulations and application to public data, Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and identifies potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse publicly available summary data resulting in improved and more equitable research.
RESUMEN
Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for various genetic analyses. In this study, we first benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: > 8 million diverse, research-consented 23andMe, Inc. customers and the UK Biobank (UKB). We find that both perform exceptionally well. Beagle's median switch error rate (SER) (after excluding single SNP switches) in white British trios from UKB is 0.026% compared to 0.00% for European ancestry 23andMe research participants; 55.6% of European ancestry 23andMe research participants have zero non-single SNP switches, compared to 42.4% of white British trios. South Asian ancestry 23andMe research participants have the highest median SER amongst the 23andMe populations, but it is still remarkably low at 0.46%. We also investigate the relationship between identity-by-descent (IBD) and SER, finding that switch errors tend to occur in regions of little or no IBD segment coverage. SHAPEIT and Beagle excel at 'intra-chromosomal' phasing, but lack the ability to phase across chromosomes, motivating us to develop an inter-chromosomal phasing method, called HAPTIC ( HAP lotype TI ling and C lustering), that assigns paternal and maternal variants discretely genome-wide. Our approach uses identity-by-descent (IBD) segments to phase blocks of variants on different chromosomes. HAPTIC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs bipartite clustering on the signed graph using spectral clustering. We test HAPTIC on 1022 UKB trios, yielding a median phase error of 0.08% in regions covered by IBD segments (33.5% of sites). We also ran HAPTIC in the 23andMe database and found a median phase error rate (the rate of mismatching alleles between the inferred and true phase) of 0.92% in Europeans (93.8% of sites) and 0.09% in admixed Africans (92.7% of sites). HAPTIC's precision depends heavily on data from relatives, so will increase as datasets grow larger and more diverse. HAPTIC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.
RESUMEN
Precision medicine initiatives across the globe have led to a revolution of repositories linking large-scale genomic data with electronic health records, enabling genomic analyses across the entire phenome. Many of these initiatives focus solely on research insights, leading to limited direct benefit to patients. We describe the biobank at the Colorado Center for Personalized Medicine (CCPM Biobank) that was jointly developed by the University of Colorado Anschutz Medical Campus and UCHealth to serve as a unique, dual-purpose research and clinical resource accelerating personalized medicine. This living resource currently has more than 200,000 participants with ongoing recruitment. We highlight the clinical, laboratory, regulatory, and HIPAA-compliant informatics infrastructure along with our stakeholder engagement, consent, recontact, and participant engagement strategies. We characterize aspects of genetic and geographic diversity unique to the Rocky Mountain region, the primary catchment area for CCPM Biobank participants. We leverage linked health and demographic information of the CCPM Biobank participant population to demonstrate the utility of the CCPM Biobank to replicate complex trait associations in the first 33,674 genotyped individuals across multiple disease domains. Finally, we describe our current efforts toward return of clinical genetic test results, including high-impact pathogenic variants and pharmacogenetic information, and our broader goals as the CCPM Biobank continues to grow. Bringing clinical and research interests together fosters unique clinical and translational questions that can be addressed from the large EHR-linked CCPM Biobank resource within a HIPAA- and CLIA-certified environment.
Asunto(s)
Aprendizaje del Sistema de Salud , Medicina de Precisión , Humanos , Bancos de Muestras Biológicas , Colorado , GenómicaRESUMEN
Inflammatory bowel disease (IBD) is characterized by complex etiology and a disrupted colonic ecosystem. We provide a framework for the analysis of multi-omic data, which we apply to study the gut ecosystem in IBD. Specifically, we train and validate models using data on the metagenome, metatranscriptome, virome, and metabolome from the Human Microbiome Project 2 IBD multi-omic database, with 1,785 repeated samples from 130 individuals (103 cases and 27 controls). After splitting the participants into training and testing groups, we used mixed-effects least absolute shrinkage and selection operator regression to select features for each omic. These features, with demographic covariates, were used to generate separate single-omic prediction scores. All four single-omic scores were then combined into a final regression to assess the relative importance of the individual omics and the predictive benefits when considered together. We identified several species, pathways, and metabolites known to be associated with IBD risk, and we explored the connections between data sets. Individually, metabolomic and viromic scores were more predictive than metagenomics or metatranscriptomics, and when all four scores were combined, we predicted disease diagnosis with a Nagelkerke's R2 of 0.46 and an area under the curve of 0.80 (95% confidence interval: 0.63, 0.98). Our work supports that some single-omic models for complex traits are more predictive than others, that incorporating multiple omic data sets may improve prediction, and that each omic data type provides a combination of unique and redundant information. This modeling framework can be extended to other complex traits and multi-omic data sets.IMPORTANCEComplex traits are characterized by many biological and environmental factors, such that multi-omic data sets are well-positioned to help us understand their underlying etiologies. We applied a prediction framework across multiple omics (metagenomics, metatranscriptomics, metabolomics, and viromics) from the gut ecosystem to predict inflammatory bowel disease (IBD) diagnosis. The predicted scores from our models highlighted key features and allowed us to compare the relative utility of each omic data set in single-omic versus multi-omic models. Our results emphasized the importance of metabolomics and viromics over metagenomics and metatranscriptomics for predicting IBD status. The greater predictive capability of metabolomics and viromics is likely because these omics serve as markers of lifestyle factors such as diet. This study provides a modeling framework for multi-omic data, and our results show the utility of combining multiple omic data types to disentangle complex disease etiologies and biological signatures.
Asunto(s)
Enfermedades Inflamatorias del Intestino , Microbiota , Humanos , Enfermedades Inflamatorias del Intestino/diagnóstico , Metagenómica/métodos , Fenotipo , Factores de RiesgoRESUMEN
Genome-wide association studies (GWAS) have allowed the identification of disease-associated variants, which can be leveraged to build polygenic scores (PGSs). Even though PGSs can be a valuable tool in personalized medicine, their predictive power is limited in populations of non-European ancestry, particularly in admixed populations. Recent efforts have focused on increasing racial and ethnic diversity in GWAS, thus, addressing some of the limitations of genetic risk prediction in these populations. Even with these efforts, few studies focus exclusively on Hispanics/Latinos. Additionally, Hispanic/Latino populations are often considered a single population despite varying admixture proportions between and within ethnic groups, diverse genetic heterogeneity, and demographic history. Combined with highly heterogeneous environmental and socioeconomic exposures, this diversity can reduce the transferability of genetic risk prediction models. Given the recent increase of genomic studies that include Hispanics/Latinos, we review the milestones and efforts that focus on genetic risk prediction, summarize the potential for improving PGS transferability, and highlight the challenges yet to be addressed. Additionally, we summarize social-ethical considerations and provide ideas to promote genetic risk prediction models that can be implemented equitably.
RESUMEN
Latin America continues to be severely underrepresented in genomics research, and fine-scale genetic histories and complex trait architectures remain hidden owing to insufficient data1. To fill this gap, the Mexican Biobank project genotyped 6,057 individuals from 898 rural and urban localities across all 32 states in Mexico at a resolution of 1.8 million genome-wide markers with linked complex trait and disease information creating a valuable nationwide genotype-phenotype database. Here, using ancestry deconvolution and inference of identity-by-descent segments, we inferred ancestral population sizes across Mesoamerican regions over time, unravelling Indigenous, colonial and postcolonial demographic dynamics2-6. We observed variation in runs of homozygosity among genomic regions with different ancestries reflecting distinct demographic histories and, in turn, different distributions of rare deleterious variants. We conducted genome-wide association studies (GWAS) for 22 complex traits and found that several traits are better predicted using the Mexican Biobank GWAS compared to the UK Biobank GWAS7,8. We identified genetic and environmental factors associating with trait variation, such as the length of the genome in runs of homozygosity as a predictor for body mass index, triglycerides, glucose and height. This study provides insights into the genetic histories of individuals in Mexico and dissects their complex trait architectures, both crucial for making precision and preventive medicine initiatives accessible worldwide.
Asunto(s)
Bancos de Muestras Biológicas , Genética Médica , Genoma Humano , Genómica , Hispánicos o Latinos , Humanos , Glucemia/genética , Glucemia/metabolismo , Estatura/genética , Índice de Masa Corporal , Interacción Gen-Ambiente , Marcadores Genéticos/genética , Estudio de Asociación del Genoma Completo , Hispánicos o Latinos/clasificación , Hispánicos o Latinos/genética , Homocigoto , México , Fenotipo , Triglicéridos/sangre , Triglicéridos/genética , Reino Unido , Genoma Humano/genéticaRESUMEN
The heritability explained by local ancestry markers in an admixed population (hγ2) provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present heritability estimation from admixture mapping summary statistics (HAMSTA), an approach that uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of â¼5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hËγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hËγ2 = 0.012 ± 9.2 × 10-4), which translates to hË2 ranging from 0.062 to 0.85 (mean hË2 = 0.30 ± 0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 ± 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.
Asunto(s)
Negro o Afroamericano , Genética de Población , Humanos , Mapeo Cromosómico , Fenotipo , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Inadequate representation of non-European ancestry populations in genome-wide association studies (GWAS) has limited opportunities to isolate functional variants. Fine-mapping in multi-ancestry populations should improve the efficiency of prioritizing variants for functional interrogation. To evaluate this hypothesis, we leveraged ancestry architecture to perform comparative GWAS and fine-mapping of obesity-related phenotypes in European ancestry populations from the UK Biobank (UKBB) and multi-ancestry samples from the Population Architecture for Genetic Epidemiology (PAGE) consortium with comparable sample sizes. In the investigated regions with genome-wide significant associations for obesity-related traits, fine-mapping in our ancestrally diverse sample led to 95% and 99% credible sets (CS) with fewer variants than in the European ancestry sample. Lead fine-mapped variants in PAGE regions had higher average coding scores, and higher average posterior probabilities for causality compared to UKBB. Importantly, 99% CS in PAGE loci contained strong expression quantitative trait loci (eQTLs) in adipose tissues or harbored more variants in tighter linkage disequilibrium (LD) with eQTLs. Leveraging ancestrally diverse populations with heterogeneous ancestry architectures, coupled with functional annotation, increased fine-mapping efficiency and performance, and reduced the set of candidate variants for consideration for future functional studies. Significant overlap in genetic causal variants across populations suggests generalizability of genetic mechanisms underpinning obesity-related traits across populations.
Asunto(s)
Estudio de Asociación del Genoma Completo , Obesidad , Humanos , Epidemiología Molecular , Desequilibrio de Ligamiento , Obesidad/genética , Sitios de Carácter Cuantitativo/genéticaRESUMEN
INTRODUCTION: The independent and causal cardiovascular disease risk factor lipoprotein(a) (Lp(a)) is elevated in >1.5 billion individuals worldwide, but studies have prioritised European populations. METHODS: Here, we examined how ancestrally diverse studies could clarify Lp(a)'s genetic architecture, inform efforts examining application of Lp(a) polygenic risk scores (PRS), enable causal inference and identify unexpected Lp(a) phenotypic effects using data from African (n=25 208), East Asian (n=2895), European (n=362 558), South Asian (n=8192) and Hispanic/Latino (n=8946) populations. RESULTS: Fourteen genome-wide significant loci with numerous population specific signals of large effect were identified that enabled construction of Lp(a) PRS of moderate (R2=15% in East Asians) to high (R2=50% in Europeans) accuracy. For all populations, PRS showed promise as a 'rule out' for elevated Lp(a) because certainty of assignment to the low-risk threshold was high (88.0%-99.9%) across PRS thresholds (80th-99th percentile). Causal effects of increased Lp(a) with increased glycated haemoglobin were estimated for Europeans (p value =1.4×10-6), although inverse effects in Africans and East Asians suggested the potential for heterogeneous causal effects. Finally, Hispanic/Latinos were the only population in which known associations with coronary atherosclerosis and ischaemic heart disease were identified in external testing of Lp(a) PRS phenotypic effects. CONCLUSIONS: Our results emphasise the merits of prioritising ancestral diversity when addressing Lp(a) evidence gaps.
Asunto(s)
Enfermedad de la Arteria Coronaria , Isquemia Miocárdica , Humanos , Lipoproteína(a)/genética , Lagunas en las Evidencias , Factores de Riesgo , Enfermedad de la Arteria Coronaria/diagnóstico , Enfermedad de la Arteria Coronaria/epidemiología , Enfermedad de la Arteria Coronaria/genéticaRESUMEN
Peripheral artery disease (PAD) is a form of atherosclerotic cardiovascular disease, affecting â¼8 million Americans, and is known to have racial and ethnic disparities. PAD has been reported to have a significantly higher prevalence in African Americans (AAs) compared to non-Hispanic European Americans (EAs). Hispanic/Latinos (HLs) have been reported to have lower or similar rates of PAD compared to EAs, despite having a paradoxically high burden of PAD risk factors; however, recent work suggests prevalence may differ between sub-groups. Here, we examined a large cohort of diverse adults in the BioMe biobank in New York City. We observed the prevalence of PAD at 1.7% in EAs vs. 8.5% and 9.4% in AAs and HLs, respectively, and among HL sub-groups, the prevalence was found at 11.4% and 11.5% in Puerto Rican and Dominican populations, respectively. Follow-up analysis that adjusted for common risk factors demonstrated that Dominicans had the highest increased risk for PAD relative to EAs [OR = 3.15 (95% CI 2.33-4.25), p < 6.44 × 10-14]. To investigate whether genetic factors may explain this increased risk, we performed admixture mapping by testing the association between local ancestry and PAD in Dominican BioMe participants (N = 1,813) separately from European, African, and Native American (NAT) continental ancestry tracts. The top association with PAD was an NAT ancestry tract at chromosome 2q35 [OR = 1.96 (SE = 0.16), p < 2.75 × 10-05) with 22.6% vs. 12.9% PAD prevalence in heterozygous NAT tract carriers versus non-carriers, respectively. Fine-mapping at this locus implicated tag SNP rs78529201 located within a long intergenic non-coding RNA (lincRNA) LINC00607, a gene expression regulator of key genes related to thrombosis and extracellular remodeling of endothelial cells, suggesting a putative link of the 2q35 locus to PAD etiology. Efforts to reproduce the signal in other Hispanic cohorts were unsuccessful. In summary, we showed how leveraging health system data helped understand nuances of PAD risk across HL sub-groups and admixture mapping approaches elucidated a putative risk locus in a Dominican population.
RESUMEN
An individual's disease risk is affected by the populations that they belong to, due to shared genetics and environmental factors. The study of fine-scale populations in clinical care is important for identifying and reducing health disparities and for developing personalized interventions. To assess patterns of clinical diagnoses and healthcare utilization by fine-scale populations, we leveraged genetic data and electronic medical records from 35,968 patients as part of the UCLA ATLAS Community Health Initiative. We defined clusters of individuals using identity by descent, a form of genetic relatedness that utilizes shared genomic segments arising due to a common ancestor. In total, we identified 376 clusters, including clusters with patients of Afro-Caribbean, Puerto Rican, Lebanese Christian, Iranian Jewish and Gujarati ancestry. Our analysis uncovered 1,218 significant associations between disease diagnoses and clusters and 124 significant associations with specialty visits. We also examined the distribution of pathogenic alleles and found 189 significant alleles at elevated frequency in particular clusters, including many that are not regularly included in population screening efforts. Overall, this work progresses the understanding of health in understudied communities and can provide the foundation for further study into health inequities.
Asunto(s)
Atención a la Salud , Aceptación de la Atención de Salud , Humanos , Los Angeles , Irán , EtnicidadRESUMEN
We explored ancestry-related differences in the genetic architecture of whole-blood gene expression using whole-genome and RNA sequencing data from 2,733 African Americans, Puerto Ricans and Mexican Americans. We found that heritability of gene expression significantly increased with greater proportions of African genetic ancestry and decreased with higher proportions of Indigenous American ancestry, reflecting the relationship between heterozygosity and genetic variance. Among heritable protein-coding genes, the prevalence of ancestry-specific expression quantitative trait loci (anc-eQTLs) was 30% in African ancestry and 8% for Indigenous American ancestry segments. Most anc-eQTLs (89%) were driven by population differences in allele frequency. Transcriptome-wide association analyses of multi-ancestry summary statistics for 28 traits identified 79% more gene-trait associations using transcriptome prediction models trained in our admixed population than models trained using data from the Genotype-Tissue Expression project. Our study highlights the importance of measuring gene expression across large and ancestrally diverse populations for enabling new discoveries and reducing disparities.
Asunto(s)
Negro o Afroamericano , Hispánicos o Latinos , Americanos Mexicanos , Humanos , Negro o Afroamericano/genética , Estudio de Asociación del Genoma Completo , Hispánicos o Latinos/genética , Americanos Mexicanos/genética , Fenotipo , Polimorfismo de Nucleótido Simple , TranscriptomaRESUMEN
The heritability explained by local ancestry markers in an admixed population hγ2 provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present a novel approach, Heritability estimation from Admixture Mapping Summary STAtistics (HAMSTA), which uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ~5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hËγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hËγ2=0.012+/-9.2×10-4), which translates to hË2 ranging from 0.062 to 0.85 (mean hË2=0.30+/-0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 +/- 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.
RESUMEN
Peripheral artery disease (PAD) is a form of atherosclerotic cardiovascular disease, affecting â¼8 million Americans, and is known to have racial and ethnic disparities. PAD has been reported to have significantly higher prevalence in African Americans (AAs) compared to non-Hispanic European Americans (EAs). Hispanic/Latinos (HLs) have been reported to have lower or similar rates of PAD compared to EAs, despite having a paradoxically high burden of PAD risk factors, however recent work suggests prevalence may differ between sub-groups. Here we examined a large cohort of diverse adults in the Bio Me biobank in New York City (NYC). We observed the prevalence of PAD at 1.7% in EAs vs 8.5% and 9.4% in AAs and HLs, respectively; and among HL sub-groups, at 11.4% and 11.5% in Puerto Rican and Dominican populations, respectively. Follow-up analysis that adjusted for common risk factors demonstrated that Dominicans had the highest increased risk for PAD relative to EAs (OR=3.15 (95% CI 2.33-4.25), P <6.44×10 -14 ). To investigate whether genetic factors may explain this increased risk, we performed admixture mapping by testing the association between local ancestry (LA) and PAD in Dominican Bio Me participants (N=1,940) separately for European (EUR), African (AFR) and Native American (NAT) continental ancestry tracts. We identified a NAT ancestry tract at chromosome 2q35 that was significantly associated with PAD (OR=2.05 (95% CI 1.51-2.78), P <4.06×10 -6 ) with 22.5% vs 12.5% PAD prevalence in heterozygous NAT tract carriers versus non-carriers, respectively. Fine-mapping at this locus implicated tag SNP rs78529201 located within a long intergenic non-coding RNA (lincRNA) LINC00607 , a gene expression regulator of key genes related to thrombosis and extracellular remodeling of endothelial cells, suggesting a putative link of the 2q35 locus to PAD etiology. In summary, we showed how leveraging health systems data helped understand nuances of PAD risk across HL sub-groups and admixture mapping approaches elucidated a novel risk locus in a Dominican population.
Asunto(s)
Enteritis , Esofagitis Eosinofílica , Gastritis , Humanos , Esofagitis Eosinofílica/genéticaRESUMEN
Over 6.37 million people have died from COVID-19 worldwide, but factors influencing COVID-19-related mortality remain understudied. We aimed to describe and identify risk factors for COVID-19 mortality in the Colorado Center for Personalized Medicine (CCPM) Biobank using integrated data sources, including Electronic Health Records (EHRs). We calculated cause-specific mortality and case-fatality rates for COVID-19 and common pre-existing health conditions defined by diagnostic phecodes and encounters in EHRs. We performed multivariable logistic regression analyses of the association between each pre-existing condition and COVID-19 mortality. Of the 155,859 Biobank participants enrolled as of July 2022, 20,797 had been diagnosed with COVID-19. Of 5334 Biobank participants who had died, 190 were attributed to COVID-19. The case-fatality rate was 0.91% and the COVID-19 mortality rate was 122 per 100,000 persons. The odds of dying from COVID-19 were significantly increased among older men, and those with 14 of the 61 pre-existing conditions tested, including hypertensive chronic kidney disease (OR: 10.14, 95% CI: 5.48, 19.16) and type 2 diabetes with renal manifestations (OR: 5.59, 95% CI: 3.42, 8.97). Male patients who are older and have pre-existing kidney diseases may be at higher risk for death from COVID-19 and may require special care.
Asunto(s)
COVID-19 , Diabetes Mellitus Tipo 2 , Humanos , Masculino , Anciano , Diabetes Mellitus Tipo 2/epidemiología , SARS-CoV-2 , Colorado/epidemiología , Bancos de Muestras Biológicas , Medicina de Precisión , Factores de RiesgoRESUMEN
Groups of distantly related individuals who share a short segment of their genome identical-by-descent (IBD) can provide insights about rare traits and diseases in massive biobanks using IBD mapping. Clustering algorithms play an important role in finding these groups accurately and at scale. We set out to analyze the fitness of commonly used, fast and scalable clustering algorithms for IBD mapping applications. We designed a realistic benchmark for local IBD graphs and utilized it to compare the statistical power of clustering algorithms via simulating 2.3 million clusters across 850 experiments. We found Infomap and Markov Clustering (MCL) community detection methods to have high statistical power in most of the scenarios. They yield a 30% increase in power compared to the current state-of-art approach, with a 3 orders of magnitude lower runtime. We also found that standard clustering metrics, such as modularity, cannot predict statistical power of algorithms in IBD mapping applications. We extend our findings to real datasets by analyzing the Population Architecture using Genomics and Epidemiology (PAGE) Study dataset with 51,000 samples and 2 million shared segments on Chromosome 1, resulting in the extraction of 39 million local IBD clusters. We demonstrate the power of our approach by recovering signals of rare genetic variation in the Whole-Exome Sequence data of 200,000 individuals in the UK Biobank. We provide an efficient implementation to enable clustering at scale for IBD mapping for various populations and scenarios.Supplementary Information: The code, along with supplementary methods and figures are available at https://github.com/roohy/localIBDClustering.
Asunto(s)
Algoritmos , Biología Computacional , Humanos , Genómica , Análisis por ConglomeradosRESUMEN
Since the initial reported discovery of SARS-CoV-2 in late 2019, genomic surveillance has been an important tool to understand its transmission and evolution. Here, we sought to describe the underlying regional phylodynamics before and during a rapid spreading event that was documented by surveillance protocols of the United States Air Force Academy (USAFA) in late October-November of 2020. We used replicate long-read sequencing on Colorado SARS-CoV-2 genomes collected July through November 2020 at the University of Colorado Anschutz Medical campus in Aurora and the United States Air Force Academy in Colorado Springs. Replicate sequencing allowed rigorous validation of variation and placement in a phylogenetic relatedness network. We focus on describing the phylodynamics of a lineage that likely originated in the local Colorado Springs community and expanded rapidly over the course of two months in an outbreak within the well-controlled environment of the United States Air Force Academy. Divergence estimates from sampling dates indicate that the SARS-CoV-2 lineage associated with this rapid expansion event originated in late October 2020. These results are in agreement with transmission pathways inferred by the United States Air Force Academy, and provide a window into the evolutionary process and transmission dynamics of a potentially dangerous but ultimately contained variant.