Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 69
Filter
1.
Nat Rev Genet ; 2024 May 28.
Article in English | MEDLINE | ID: mdl-38806721

ABSTRACT

Gene-environment interactions (G × E), the interplay of genetic variation with environmental factors, have a pivotal impact on human complex traits and diseases. Statistically, G × E can be assessed by determining the deviation from expectation of predictive models based solely on the phenotypic effects of genetics or environmental exposures. Despite the unprecedented, widespread and diverse use of G × E analytical frameworks, heterogeneity in their application and reporting hinders their applicability in public health. In this Review, we discuss study design considerations as well as G × E analytical frameworks to assess polygenic liability dependent on the environment, to identify specific genetic variants exhibiting G × E, and to characterize environmental context for these dynamics. We conclude with recommendations to address the most common challenges and pitfalls in the conceptualization, methodology and reporting of G × E studies, as well as future directions.

2.
Cell Genom ; 4(4): 100539, 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-38604127

ABSTRACT

Polygenic risk scores (PRSs) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in summary statistics from genome-wide association studies (GWASs) across multiple ancestry groups via Bayesian hierarchical modeling and ensemble learning. In our simulation studies and data analyses across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. For example, MUSSEL has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, trait architecture, and linkage disequilibrium reference samples; thus, ultimately a combination of methods may be needed to generate the most robust PRSs across diverse populations.


Subject(s)
Bivalvia , Multifactorial Inheritance , Humans , Animals , Multifactorial Inheritance/genetics , Genome-Wide Association Study/methods , Bayes Theorem , Phenotype , Genetic Risk Score
3.
PLoS One ; 19(4): e0301069, 2024.
Article in English | MEDLINE | ID: mdl-38669259

ABSTRACT

Nearly 300 million individuals live with chronic hepatitis B virus (HBV) infection (CHB), for which no curative therapy is available. As viral diversity is associated with pathogenesis and immunological control of infection, improved methods to characterize this diversity could aid drug development efforts. Conventionally, viral sequencing data are mapped/aligned to a reference genome, and only the aligned sequences are retained for analysis. Thus, reference selection is critical, yet selecting the most representative reference a priori remains difficult. We investigate an alternative pangenome approach which can combine multiple reference sequences into a graph which can be used during alignment. Using simulated short-read sequencing data generated from publicly available HBV genomes and real sequencing data from an individual living with CHB, we demonstrate alignment to a phylogenetically representative 'genome graph' can improve alignment, avoid issues of reference ambiguity, and facilitate the construction of sample-specific consensus sequences more genetically similar to the individual's infection. Graph-based methods can, therefore, improve efforts to characterize the genetics of viral pathogens, including HBV, and have broader implications in host-pathogen research.


Subject(s)
Consensus Sequence , Genome, Viral , Hepatitis B virus , Hepatitis B virus/genetics , Humans , Consensus Sequence/genetics , Phylogeny , Sequence Alignment/methods , Genetic Variation , Hepatitis B, Chronic/virology , DNA, Viral/genetics , Sequence Analysis, DNA/methods
4.
Nat Genet ; 56(5): 752-757, 2024 May.
Article in English | MEDLINE | ID: mdl-38684898

ABSTRACT

Health equity is the state in which everyone has fair and just opportunities to attain their highest level of health. The field of human genomics has fallen short in increasing health equity, largely because the diversity of the human population has been inadequately reflected among participants of genomics research. This lack of diversity leads to disparities that can have scientific and clinical consequences. Achieving health equity related to genomics will require greater effort in addressing inequities within the field. As part of the commitment of the National Human Genome Research Institute (NHGRI) to advancing health equity, it convened experts in genomics and health equity research to make recommendations and performed a review of current literature to identify the landscape of gaps and opportunities at the interface between human genomics and health equity research. This Perspective describes these findings and examines health equity within the context of human genomics and genomic medicine.


Subject(s)
Genomics , Health Equity , Humans , Genomics/methods , United States , Genome, Human , National Human Genome Research Institute (U.S.)
5.
Open Forum Infect Dis ; 11(3): ofae045, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38524222

ABSTRACT

Background: Astroviral infections commonly cause acute nonbacterial gastroenteritis in children globally. However, these infections often go undiagnosed outside of research settings. There is no treatment available for astrovirus, and Astroviridae strain diversity presents a challenge to potential vaccine development. Methods: To address our hypothesis that host genetic risk factors are associated with astrovirus disease susceptibility, we performed a genome-wide association study of astrovirus infection in the first year of life from children enrolled in 2 Bangladeshi birth cohorts. Results: We identified a novel region on chromosome 1 near the loricrin gene (LOR) associated with astrovirus diarrheal infection (rs75437404; meta-analysis P = 8.82 × 10-9; A allele odds ratio, 2.71) and on chromosome 10 near the prolactin releasing hormone receptor gene (PRLHR) (rs75935441; meta-analysis P = 1.33 × 10-8; C allele odds ratio, 4.17). The prolactin-releasing peptide has been shown to influence feeding patterns and energy balance in mice. In addition, several single-nucleotide polymorphisms in the chromosome 1 locus have previously been associated with expression of innate immune system genes PGLYRP4, S100A9, and S100A12. Conclusions: This study identified 2 significant host genetic regions that may influence astrovirus diarrhea susceptibility and should be considered in further studies.

6.
Nature ; 622(7984): 775-783, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37821706

ABSTRACT

Latin America continues to be severely underrepresented in genomics research, and fine-scale genetic histories and complex trait architectures remain hidden owing to insufficient data1. To fill this gap, the Mexican Biobank project genotyped 6,057 individuals from 898 rural and urban localities across all 32 states in Mexico at a resolution of 1.8 million genome-wide markers with linked complex trait and disease information creating a valuable nationwide genotype-phenotype database. Here, using ancestry deconvolution and inference of identity-by-descent segments, we inferred ancestral population sizes across Mesoamerican regions over time, unravelling Indigenous, colonial and postcolonial demographic dynamics2-6. We observed variation in runs of homozygosity among genomic regions with different ancestries reflecting distinct demographic histories and, in turn, different distributions of rare deleterious variants. We conducted genome-wide association studies (GWAS) for 22 complex traits and found that several traits are better predicted using the Mexican Biobank GWAS compared to the UK Biobank GWAS7,8. We identified genetic and environmental factors associating with trait variation, such as the length of the genome in runs of homozygosity as a predictor for body mass index, triglycerides, glucose and height. This study provides insights into the genetic histories of individuals in Mexico and dissects their complex trait architectures, both crucial for making precision and preventive medicine initiatives accessible worldwide.


Subject(s)
Biological Specimen Banks , Genetics, Medical , Genome, Human , Genomics , Hispanic or Latino , Humans , Blood Glucose/genetics , Blood Glucose/metabolism , Body Height/genetics , Body Mass Index , Gene-Environment Interaction , Genetic Markers/genetics , Genome-Wide Association Study , Hispanic or Latino/classification , Hispanic or Latino/genetics , Homozygote , Mexico , Phenotype , Triglycerides/blood , Triglycerides/genetics , United Kingdom , Genome, Human/genetics
7.
Am J Hum Genet ; 110(11): 1853-1862, 2023 11 02.
Article in English | MEDLINE | ID: mdl-37875120

ABSTRACT

The heritability explained by local ancestry markers in an admixed population (hγ2) provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present heritability estimation from admixture mapping summary statistics (HAMSTA), an approach that uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ∼5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hˆγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hˆγ2 = 0.012 ± 9.2 × 10-4), which translates to hˆ2 ranging from 0.062 to 0.85 (mean hˆ2 = 0.30 ± 0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 ± 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.


Subject(s)
Black or African American , Genetics, Population , Humans , Chromosome Mapping , Phenotype , Polymorphism, Single Nucleotide/genetics
8.
Clin J Am Soc Nephrol ; 18(11): 1416-1425, 2023 11 01.
Article in English | MEDLINE | ID: mdl-37533140

ABSTRACT

BACKGROUND: Sickle cell trait affects approximately 8% of Black individuals in the United States, along with many other individuals with ancestry from malaria-endemic regions worldwide. While traditionally considered a benign condition, recent evidence suggests that sickle cell trait is associated with lower eGFR and higher risk of kidney diseases, including kidney failure. The mechanisms underlying these associations remain poorly understood. We used proteomic profiling to gain insight into the pathobiology of sickle cell trait. METHODS: We measured proteomics ( N =1285 proteins assayed by Olink Explore) using baseline plasma samples from 592 Black participants with sickle cell trait and 1:1 age-matched Black participants without sickle cell trait from the prospective Women's Health Initiative cohort. Age-adjusted linear regression was used to assess the association between protein levels and sickle cell trait. RESULTS: In age-adjusted models, 35 proteins were significantly associated with sickle cell trait after correction for multiple testing. Several of the sickle cell trait-protein associations were replicated in Black participants from two independent cohorts (Atherosclerosis Risk in Communities study and Jackson Heart Study) assayed using an orthogonal aptamer-based proteomic platform (SomaScan). Many of the validated sickle cell trait-associated proteins are known biomarkers of kidney function or injury ( e.g. , hepatitis A virus cellular receptor 1 [HAVCR1]/kidney injury molecule-1 [KIM-1], uromodulin [UMOD], ephrins), related to red cell physiology or hemolysis (erythropoietin [EPO], heme oxygenase 1 [HMOX1], and α -hemoglobin stabilizing protein) and/or inflammation (fractalkine, C-C motif chemokine ligand 2/monocyte chemoattractant protein-1 [MCP-1], and urokinase plasminogen activator surface receptor [PLAUR]). A protein risk score constructed from the top sickle cell trait-associated biomarkers was associated with incident kidney failure among those with sickle cell trait during Women's Health Initiative follow-up (odds ratio, 1.32; 95% confidence interval, 1.10 to 1.58). CONCLUSIONS: We identified and replicated the association of sickle cell trait with a number of plasma proteins related to hemolysis, kidney injury, and inflammation.


Subject(s)
Renal Insufficiency , Sickle Cell Trait , Humans , Female , United States , Proteome , Prospective Studies , Hemolysis , Proteomics , Biomarkers , Inflammation
9.
Front Genet ; 14: 1181167, 2023.
Article in English | MEDLINE | ID: mdl-37600667

ABSTRACT

Peripheral artery disease (PAD) is a form of atherosclerotic cardiovascular disease, affecting ∼8 million Americans, and is known to have racial and ethnic disparities. PAD has been reported to have a significantly higher prevalence in African Americans (AAs) compared to non-Hispanic European Americans (EAs). Hispanic/Latinos (HLs) have been reported to have lower or similar rates of PAD compared to EAs, despite having a paradoxically high burden of PAD risk factors; however, recent work suggests prevalence may differ between sub-groups. Here, we examined a large cohort of diverse adults in the BioMe biobank in New York City. We observed the prevalence of PAD at 1.7% in EAs vs. 8.5% and 9.4% in AAs and HLs, respectively, and among HL sub-groups, the prevalence was found at 11.4% and 11.5% in Puerto Rican and Dominican populations, respectively. Follow-up analysis that adjusted for common risk factors demonstrated that Dominicans had the highest increased risk for PAD relative to EAs [OR = 3.15 (95% CI 2.33-4.25), p < 6.44 × 10-14]. To investigate whether genetic factors may explain this increased risk, we performed admixture mapping by testing the association between local ancestry and PAD in Dominican BioMe participants (N = 1,813) separately from European, African, and Native American (NAT) continental ancestry tracts. The top association with PAD was an NAT ancestry tract at chromosome 2q35 [OR = 1.96 (SE = 0.16), p < 2.75 × 10-05) with 22.6% vs. 12.9% PAD prevalence in heterozygous NAT tract carriers versus non-carriers, respectively. Fine-mapping at this locus implicated tag SNP rs78529201 located within a long intergenic non-coding RNA (lincRNA) LINC00607, a gene expression regulator of key genes related to thrombosis and extracellular remodeling of endothelial cells, suggesting a putative link of the 2q35 locus to PAD etiology. Efforts to reproduce the signal in other Hispanic cohorts were unsuccessful. In summary, we showed how leveraging health system data helped understand nuances of PAD risk across HL sub-groups and admixture mapping approaches elucidated a putative risk locus in a Dominican population.

10.
medRxiv ; 2023 Jun 29.
Article in English | MEDLINE | ID: mdl-37425840

ABSTRACT

Hepatitis B virus (HBV) vaccine escape mutants (VEM) are increasingly described, threatening progress in control of this virus worldwide. Here we studied the relationship between host genetic variation, vaccine immunogenicity and viral sequences implicating VEM emergence. In a cohort of 1,096 Bangladeshi children, we identified human leukocyte antigen (HLA) variants associated with response vaccine antigens. Using an HLA imputation panel with 9,448 south Asian individuals DPB1*04:01 was associated with higher HBV antibody responses (p=4.5×10-30). The underlying mechanism is a result of higher affinity binding of HBV surface antigen epitopes to DPB1*04:01 dimers. This is likely a result of evolutionary pressure at the HBV surface antigen 'a-determinant' segment incurring VEM specific to HBV. Prioritizing pre-S isoform HBV vaccines may tackle the rise of HBV vaccine evasion.

11.
Trends Genet ; 39(11): 813-815, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37524625

ABSTRACT

Polygenic scores (PGSs) aggregate the effects of variants across the genome to estimate genetic liability, but have lower performance in external study populations. A new study by Ding et al. has applied a novel framework to estimate the individual-level predictive accuracy of PGSs, and demonstrates that performance reduction occurs linearly with genetic distance.

13.
bioRxiv ; 2023 Apr 18.
Article in English | MEDLINE | ID: mdl-37131817

ABSTRACT

The heritability explained by local ancestry markers in an admixed population hγ2 provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present a novel approach, Heritability estimation from Admixture Mapping Summary STAtistics (HAMSTA), which uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ~5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hˆγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hˆγ2=0.012+/-9.2×10-4), which translates to hˆ2 ranging from 0.062 to 0.85 (mean hˆ2=0.30+/-0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 +/- 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.

14.
bioRxiv ; 2023 Sep 21.
Article in English | MEDLINE | ID: mdl-37090648

ABSTRACT

Polygenic risk scores (PRS) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across different populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in the summary statistics from genome-wide association studies (GWAS) across multiple ancestry groups. MUSSEL conducts Bayesian hierarchical modeling under a MUltivariate Spike-and-Slab model for effect-size distribution and incorporates an Ensemble Learning step using super learner to combine information across different tuning parameter settings and ancestry groups. In our simulation studies and data analyses of 16 traits across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. The method, for example, has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African Ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, underlying trait architecture, and the choice of reference samples for LD estimation, and thus ultimately, a combination of methods may be needed to generate the most robust PRS across diverse populations.

15.
medRxiv ; 2023 Mar 29.
Article in English | MEDLINE | ID: mdl-37034679

ABSTRACT

Peripheral artery disease (PAD) is a form of atherosclerotic cardiovascular disease, affecting ∼8 million Americans, and is known to have racial and ethnic disparities. PAD has been reported to have significantly higher prevalence in African Americans (AAs) compared to non-Hispanic European Americans (EAs). Hispanic/Latinos (HLs) have been reported to have lower or similar rates of PAD compared to EAs, despite having a paradoxically high burden of PAD risk factors, however recent work suggests prevalence may differ between sub-groups. Here we examined a large cohort of diverse adults in the Bio Me biobank in New York City (NYC). We observed the prevalence of PAD at 1.7% in EAs vs 8.5% and 9.4% in AAs and HLs, respectively; and among HL sub-groups, at 11.4% and 11.5% in Puerto Rican and Dominican populations, respectively. Follow-up analysis that adjusted for common risk factors demonstrated that Dominicans had the highest increased risk for PAD relative to EAs (OR=3.15 (95% CI 2.33-4.25), P <6.44×10 -14 ). To investigate whether genetic factors may explain this increased risk, we performed admixture mapping by testing the association between local ancestry (LA) and PAD in Dominican Bio Me participants (N=1,940) separately for European (EUR), African (AFR) and Native American (NAT) continental ancestry tracts. We identified a NAT ancestry tract at chromosome 2q35 that was significantly associated with PAD (OR=2.05 (95% CI 1.51-2.78), P <4.06×10 -6 ) with 22.5% vs 12.5% PAD prevalence in heterozygous NAT tract carriers versus non-carriers, respectively. Fine-mapping at this locus implicated tag SNP rs78529201 located within a long intergenic non-coding RNA (lincRNA) LINC00607 , a gene expression regulator of key genes related to thrombosis and extracellular remodeling of endothelial cells, suggesting a putative link of the 2q35 locus to PAD etiology. In summary, we showed how leveraging health systems data helped understand nuances of PAD risk across HL sub-groups and admixture mapping approaches elucidated a novel risk locus in a Dominican population.

16.
Nat Genet ; 55(4): 549-558, 2023 04.
Article in English | MEDLINE | ID: mdl-36941441

ABSTRACT

Individuals of admixed ancestries (for example, African Americans) inherit a mosaic of ancestry segments (local ancestry) originating from multiple continental ancestral populations. This offers the unique opportunity of investigating the similarity of genetic effects on traits across ancestries within the same population. Here we introduce an approach to estimate correlation of causal genetic effects (radmix) across local ancestries and analyze 38 complex traits in African-European admixed individuals (N = 53,001) to observe very high correlations (meta-analysis radmix = 0.95, 95% credible interval 0.93-0.97), much higher than correlation of causal effects across continental ancestries. We replicate our results using regression-based methods from marginal genome-wide association study summary statistics. We also report realistic scenarios where regression-based methods yield inflated heterogeneity-by-ancestry due to ancestry-specific tagging of causal effects, and/or polygenicity. Our results motivate genetic analyses that assume minimal heterogeneity in causal effects by ancestry, with implications for the inclusion of ancestry-diverse individuals in studies.


Subject(s)
Genetics, Population , Multifactorial Inheritance , Humans , Multifactorial Inheritance/genetics , Genome-Wide Association Study/methods , Racial Groups/genetics , Black or African American/genetics , Polymorphism, Single Nucleotide/genetics
17.
J Infect Dis ; 228(8): 979-989, 2023 10 18.
Article in English | MEDLINE | ID: mdl-36967705

ABSTRACT

BACKGROUND: Diarrhea is the second leading cause of death in children under 5 years old worldwide. Known diarrhea risk factors include sanitation, water sources, and pathogens but do not fully explain the heterogeneity in frequency and duration of diarrhea in young children. We evaluated the role of host genetics in diarrhea. METHODS: Using 3 well-characterized birth cohorts from an impoverished area of Dhaka, Bangladesh, we compared infants with no diarrhea in the first year of life to those with an abundance, measured by either frequency or duration. We performed a genome-wide association analysis for each cohort under an additive model and then meta-analyzed across the studies. RESULTS: For diarrhea frequency, we identified 2 genome-wide significant loci associated with not having any diarrhea, on chromosome 21 within the noncoding RNA AP000959 (C allele odds ratio [OR] = 0.31, P = 4.01 × 10-8), and on chromosome 8 within SAMD12 (T allele OR = 0.35, P = 4.74 × 10-7). For duration of diarrhea, we identified 2 loci associated with no diarrhea, including the same locus on chromosome 21 (C allele OR = 0.31, P = 1.59 × 10-8) and another locus on chromosome 17 near WSCD1 (C allele OR = 0.35, P = 1.09 × 10-7). CONCLUSIONS: These loci are in or near genes involved in enteric nervous system development and intestinal inflammation and may be potential targets for diarrhea therapeutics.


Subject(s)
Diarrhea , Genome-Wide Association Study , Child , Humans , Infant , Child, Preschool , Bangladesh/epidemiology , Risk Factors , Diarrhea/epidemiology , Diarrhea/genetics , Alleles
18.
Clin Cancer Res ; 29(7): 1194-1199, 2023 04 03.
Article in English | MEDLINE | ID: mdl-36638200

ABSTRACT

Despite much vaunted progress in cancer therapeutics and diagnostics, outcomes for many groups of non-White patients with cancer remain worse than those for their White compatriots. One reason for this is the lack of inclusion and representation of non-White patients in clinical trials, preclinical datasets, and among researchers, a shortfall that is gaining wide recognition within the cancer research community and the lay public. Several reviews and editorials have commented on the negative impacts of the status quo on progress in cancer research toward medical breakthroughs that help all communities and not just White patients with cancer. In this perspective, we describe the existence of research silos focused either on the impact of socioeconomic factors proceeding from systemic racism on cancer outcomes, or on genetic ancestry as it affects the molecular biology of cancer developing in specific patient populations. While both these research areas are critical for progress toward precision medicine equity, breaking down these silos will help us gain an integrated understanding of how race and racism impact cancer development, progression, and patient outcomes. Bringing this comprehensive approach to cancer disparities research will undoubtedly improve our overall understanding of how stress and environmental factors affect the molecular biology of cancer, which will lead to the development of new diagnostics and therapeutics that are applicable across cancer patient demographics.


Subject(s)
Neoplasms , Vulnerable Populations , Humans , Neoplasms/diagnosis , Neoplasms/epidemiology , Neoplasms/genetics
19.
bioRxiv ; 2023 Jan 12.
Article in English | MEDLINE | ID: mdl-36711598

ABSTRACT

Hepatitis B virus (HBV) remains a global public health concern, with over 250 million individuals living with chronic HBV infection (CHB) and no curative therapy currently available. Viral diversity is associated with CHB pathogenesis and immunological control of infection. Improved methods to characterize the viral genome at both the population and intra-host level could aid drug development efforts. Conventionally, HBV sequencing data are aligned to a linear reference genome and only sequences capable of aligning to the reference are captured for analysis. Reference selection has additional consequences, including sample-specific 'consensus' sequence construction. It remains unclear how to select a reference from available sequences and whether a single reference is sufficient for genetic analyses. Using simulated short-read sequencing data generated from full-length publicly available HBV genome sequences and HBV sequencing data from a longitudinally sampled individual with CHB, we investigate alternative graph-based alignment approaches. We demonstrate that using a phylogenetically representative 'genome graph' for alignment, rather than linear reference sequences, avoids issues of reference ambiguity, improves alignment, and facilitates the construction of sample-specific consensus sequences genetically similar to an individual's infection. Graph-based methods can therefore improve efforts to characterize the genetics of viral pathogens, including HBV, and may have broad implications in host pathogen research.

20.
Am J Hum Genet ; 110(2): 336-348, 2023 02 02.
Article in English | MEDLINE | ID: mdl-36649706

ABSTRACT

Genome-wide association studies (GWASs) have been performed to identify host genetic factors for a range of phenotypes, including for infectious diseases. The use of population-based common control subjects from biobanks and extensive consortia is a valuable resource to increase sample sizes in the identification of associated loci with minimal additional expense. Non-differential misclassification of the outcome has been reported when the control subjects are not well characterized, which often attenuates the true effect size. However, for infectious diseases the comparison of affected subjects to population-based common control subjects regardless of pathogen exposure can also result in selection bias. Through simulated comparisons of pathogen-exposed cases and population-based common control subjects, we demonstrate that not accounting for pathogen exposure can result in biased effect estimates and spurious genome-wide significant signals. Further, the observed association can be distorted depending upon strength of the association between a locus and pathogen exposure and the prevalence of pathogen exposure. We also used a real data example from the hepatitis C virus (HCV) genetic consortium comparing HCV spontaneous clearance to persistent infection with both well-characterized control subjects and population-based common control subjects from the UK Biobank. We find biased effect estimates for known HCV clearance-associated loci and potentially spurious HCV clearance associations. These findings suggest that the choice of control subjects is especially important for infectious diseases or outcomes that are conditional upon environmental exposures.


Subject(s)
Communicable Diseases , Hepatitis C , Humans , Genome-Wide Association Study , Communicable Diseases/genetics , Phenotype , Hepatitis C/genetics , Hepacivirus
SELECTION OF CITATIONS
SEARCH DETAIL
...